site banner

Culture War Roundup for the week of January 15, 2024

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

13
Jump in the discussion.

No email address required.

A limitation of usual Bayesian reasoning.

Scott is doing his annual subscription drive and I was reminded of a (still) private post of his I disagree with: https://www.astralcodexten.com/p/but-seriously-are-bloxors-greeblic

In my post on uncertainty around AI, I wrote:

If you have total uncertainty about a statement (“are bloxors greeblic?”), you should assign it a probability of 50%. If you have any other estimate, you can’t claim you’re just working off how radically uncertain it is. You need to present a specific case.

Commenters were skeptical! I agree this important topic needs more discussion:

And then he proceeded to list some of the objections and his objections to objections. The objection I'm personally most partial to was not listed, so I assume it's a sort of novel idea, at least in that (and this) community.

Suppose that in your travels you encounter a shady guy who offers you an opportunity to bet on the outcome of a coin flip. Nearby stands a yudkowsky, who tells you that according to his observations the coin is biased and the next flip is about 66% likely to land on heads. You know that yudkowskis are honest and good Bayesians, so you trust his assessment.

The shady guy flips the coin and it lands on tails. Now consider two possible worlds: in one the yudkowsky says that his new estimate is 50% heads, in another he says that he has updated to 65% heads. That's two very different worlds! It turns out that the yudkowsky has an important parameter: how many coinflips he has observed so far, and therefore how much of his estimation comes from the observations and how much from the prior, and for some reason he doesn't tell you its value!

Scott's assertion is correct in a narrow technical sense: in a world where the shady stranger forces you to make a bet at gunpoint, you are forced to use the yudkowsky's estimation and the yudkowsky is forced to use a symmetric prior that gives him a 50% probability of heads when he has not seen any flips at all yet.

However in the real world there's almost always an option to wait and collect more data, and whether you want to exercise it critically depends on the difference between "it's a 50/50 chance based on observing 100 coinflips" and "it's a 50/50 chance based solely on the prior I pulled out of my ass".

So what's going on I think is that people intuitively understand that there's this important difference and suspect that when Scott says that normally they should start with a 50/50 prior, he's trying to swindle them into accepting Bayesians' estimations without asking how sure they are about them. And rightfully so, because that's a valid and important question to ask and honestly Bayesians ought to get a habit of volunteering this information unprompted, instead of making incorrect technical arguments insinuating that the estimated probability alone should be enough for everyone.

I think there are two important lenses here.

Via the probability-theory lens, we must distinguish between

  • the propensity for the coin to land on heads - unknown
  • the subjective (in the Bayesian sense) probability Yudkowsky assigns to the coin landing on heads on the next flip

Under a Bayesian epistemology, the former is reasoned about using a probability density function (PDF) by which (approximately) every number between 0 and 1 is assigned a subjective probability. Then, when we observe the flip we update using the likelihood function (either x for heads or 1-x for tails). What you're talking about is essentially how spread out Yudkowsky's current PDF is.

The other lens is markets-based, which I've touched on before. Briefly, for reason that are obvious for anyone in finance, there is a world of difference between

  • believing a stock is worth X
  • offering to buy the stock for X+0.01 from anyone and sell it for X-0.01 to anyone

In real life, the bid-ask spread that market makers offer depend on a great number of factors including how informed everyone else in the market is relative to themselves. On this lens, credible intervals (or whatever phrase you want to use) are not things individuals have in isolation, they are things individuals have within a social space: if you're with a bunch of first-graders, you might have a very tight bid-ask spread when betting on whether a room-temperature superconductor was just discovered; if you're with a bunch of chemist PhDs, you're going to adopt an extremely wide spread (e.g. "somewhere between 5% and 95%").

There's also the problem that 50-50 is not actually a neutral probability, if you're a coherent Bayesian and you don't have an ultra-simple sample space. For example, if I think that the probability of each possible bloxor being greeblic is 50%, then I am committed to thinking that the probability that 70/100 bloxors being greeblic is 0.004%. So my "neutral" prior commits me to extremely strong confidence that the distribution of greeblic among those 100 bloxors is not 70!

If I set my prior for each bloxor being greeblic to 69.5%, then it is approximately neutral with respect to 70/100 bloxors being greeblic. But now I'm obviously far from neutral with respect to any individual bloxor being greeblic.

This is one of the limitations of Bayesianism as a formalism: it can model neutral belief with respect to any individual partition of the sample space, but not all partitions of the sample space. So, Scott is just wrong and frankly hasn't understood the mathematics, given his statement "If you have total uncertainty about a statement (“are bloxors greeblic?”), you should assign it a probability of 50%," since this norm implies incoherence, but coherence is a fundamental Bayesian norm.

Put briefly, what Scott is saying requires that you reject Bayesian epistemology/decision theory. I haven't read the whole post yet, but I would be surprised if he realised that.

A different model solves this. If you treat the proportion of greeblic bloxors as an unknown parameter, then assign a prior to that parameter, you can have both

  1. a single bloxor has a 50% chance of being greeblic

  2. the chance of 70/100 bloxors being greeblic is not negligible

This works because the bloxors are no longer independent; they are related through the proportion parameter. Observing one bloxor would change your belief about the parameter, and thus about the other bloxors.

A sufficiently large number of conjunctions of single-case hypotheses of the "bloxor x is greeblic" regenerates the problem. I put it in terms of proportions for familiarity's sake, but formally it's easier to understand the point if you consider Boolean operations on the elements of partitions, and note that in Bayesian epistemology the sample space is assumed to be closed under Boolean operations.

That was one of the objections listed in the post, Scott's response was that you should only be neutral about elementary propositions, not about compound ones ("bloxors are greeblic AND bloxors are grue").

I personally think that this entire kind of objections can be dismissed by pointing out that Bayesian math works correctly and without contradictions, and when looking at actual priors there's not much disagreement about how to choose them either, in practice. Nobody actually has arguments against assigning a symmetric prior to a coin bias, or even can muster a lot of enthusiasm to argue that you should use a gaussian instead of a uniform prior.

People get hot and bothered when they feel that someone tries to hide how much information they have actually updated on and how much is their prior.

That was one of the objections listed in the post, Scott's response was that you should only be neutral about elementary propositions, not about compound ones ("bloxors are greeblic AND bloxors are grue").

How do I know that "bloxor-1 is greeblic" is elementary, if I am totally uncertain about this proposition, and I don't even understand the terms? Additionally, it's arbitrary to say that one should be neutral about the elementary propositions.

I personally think that this entire kind of objections can be dismissed by pointing out that Bayesian math works correctly and without contradictions

What do you mean "correctly"?

when looking at actual priors there's not much disagreement about how to choose them either, in practice

Depends. If you interpret the probabilities as subjective degrees of belief and interpret degrees of belief in terms of idealised betting dispositions, then it's not obvious that people can introspect their own odds. Experimental work from about the Allais paradox onwards doesn't suggest that Bayesianism is a good fit with how humans actually reason under uncertainty, and without some evidence of reliability of personal introspection of priors, "My prior is X" is potentially just hot air.

Nobody actually has arguments against assigning a symmetric prior to a coin bias

How many of the arguments in probability theory have you read to come to this judgement? Because I can think of large parts of the literature dedicated to exactly this point.

How do I know that "bloxor-1 is greeblic" is elementary, if I am totally uncertain about this proposition, and I don't even understand the terms?

Skill issue.

What do you mean "correctly"?

That I, doing Bayesian math about some bets against you, will leave you poor and destitute in the long run, unless you're using Bayes too. What do you want to use instead of Bayes for the record?

the Allais paradox

My point is not that the poors are always instinctively right. My point is that they have well-honed instincts for when someone is trying to take advantage of them, and the usual Bayesian reasoning like the above rightfully triggers it, even if they don't have the concepts or the introspection to communicate to us what was that, that triggered them.

My point is that a Bayesian megamind is entirely justified in asking the yudkowsky what fraction of his prediction came from the data, and basing his bet amount on that, and grumbling about the yudkowsky being useless if he refuses to answer.

Nobody actually has arguments against assigning a symmetric prior to a coin bias

How many of the arguments in probability theory have you read to come to this judgement? Because I can think of large parts of the literature dedicated to exactly this point.

Huh?

That I, doing Bayesian math about some bets against you, will leave you poor and destitute in the long run, unless you're using Bayes too.

It's possible to set up some types of games where this is true, as well as some types of games when using Bayesian math can lead to disasters. See this paper for a pretty simple example of how setting up the game in a way that Bayesianism looks good is more complex than you seem to think: https://www.jstor.org/stable/40210799

If you're thinking of conditionalization as part of "Bayesian math" and alluding to diachronic Dutch Book Arguments, the problems here are particularly vexing. See here: https://link.springer.com/article/10.1007/s10670-020-00228-1

Richard Pettigrew, who has a background in both mathematics and philosophy, has done a lot of great work on these issues. Here's a brief and relatively simple introduction: http://m-phi.blogspot.com/2018/10/dutch-books-and-conditionalization.html

Basically, the literature thus far has been a long series of failed attempts to squeeze Bayesian epistemological juice out of pragmatic rocks.

What do you want to use instead of Bayes for the record?

The task is underspecified and hence so is your question. Can you explain more?

My point is not that the poors are always instinctively right. My point is that they have well-honed instincts for when someone is trying to take advantage of them, and the usual Bayesian reasoning like the above rightfully triggers it, even if they don't have the concepts or the introspection to communicate to us what was that, that triggered them.

My point is that a Bayesian megamind is entirely justified in asking the yudkowsky what fraction of his prediction came from the data, and basing his bet amount on that, and grumbling about the yudkowsky being useless if he refuses to answer.

I agree.

Huh?

One strand: Bayesians tend to be subjectivists, so symmetric priors are only a personal decision. Another strand: imprecise probabilists (like set-based Bayesians) tend to deny that any additive prior is mandatory (and perhaps not even permissible). Another strand: frequentists are critical of the whole Bayesian enterprise; note that criticisms of frequentists' positive claims are beside the point here.

Of course, all those criticisms of symmetric priors (as mandatory) might be wrong, but it's not true that symmetric priors are controversial, even among people with apparent expertise in the relevant mathematics and logic.

You might say, "Well, obviously if I asked you what the probability of heads is with this perfectly ordinary coin, you'd say 50%." However, we are both far from lacking any evidence with respect to that coin, and "The probability is 50%" can be interpreted in all sorts of different ways, e.g. a frequentist would want to interpret it in terms of hypothetical frequencies in a mathematical model of the coin tossing; some Bayesians would interpret it in terms of degrees of evidential support; other Bayesians would interpret it in terms of degrees of belief; some Bayesians would interpret it in terms of the degrees of belief that a rational person should have given the evidence...

The idea that you can have a prior on bloxors being greeblic strikes me as a type error. The domain of priors are propositions, that is, assignments of truth values to possible world-states, not strings of words; to the extent that we pretend assign a probability to a string of words, this is only enabled by us having an understanding that the string encodes a world->bool map (or at least a distribution on such maps, to allow for linguistic uncertainty). Without knowing the definition of "bloxors" and "greeblic", I'm not aware of any canonical interpretation this sequence of words has that yields a truth value; and it does not seem reasonable to expect that any string actually encodes a valid map, any more than it is to expect that any line noise encodes a valid polynomial.

In fact, my prior on strings of Latin characters tells me that the bloxors statement is very likely to not encode a map/proposition, and therefore to not have a probability.

The domain of priors are propositions, that is, assignments of truth values to possible world-states, not strings of words

From a mathematical point of view, you can have a probability function defined over all sorts of domains. IIRC, Rudolf Carnap initially defined probability functions over sentences (in the sense of strings of symbols in an artificial language) while John Maynard Keynes and Harold Jeffreys did so over propositions (meanings of sentences) and later Carnap over models (in the formal logic sense). Then there's frequentism and other event-based definitions...

However, I agree with your comment, as we are thinking from the point of view of probability as an epistemologically meaningful magnitude, e.g. a measure of degrees of belief or evidential support. "Bloxers are greeblic" is not part of my languages. In general, I shall have at least some background evidence about any proposition in a language I speak, and thus not have pure uncertainty.

I mean, of course I'm not saying it's impossible to define a distribution on arbitrary strings or anything; but I don't think that this is the intended interpretation of any putative "anything has a probability" maxim one would ascribe to LW-style Bayesianism.

That math only applies if “greeblic” is an independent event. If it’s a category, then either (almost?) all bloxors are greeblic, or they aren’t. I think that’s what the original article uses.

Fair point, I was assuming that Scott would think you should also assume independence unless you have evidence otherwise, but I should have stated that assumption.

Scott's claim is about statements, so there's still the problem I mention: 50% with respect to the hypothesis "Almost all bloxors are greeblic" implies very non-neutral beliefs about other statements. Similarly, if it's all bloxors that are being described, then that leaves just 50% of the probability mass to allocate among all the other possible statistical distributions, so e.g. "50% of bloxors are greeblic" and "0% of bloxors are greeblic" can't both have 50% probability as well.

While your idea is somewhat valid, it either misses the point of the question that a Bayesian probability answers or it ignores that it is an important part of Bayesian reasoning. In other words, a good Bayesian would say that your idea is trivial and irrelevant, unless there is further information acquisition. It is not a "valid and important question to ask" except for some contexts.

In your example, if you can only take the bet once, optimally choosing to take the bet or not involves calculating the expected gain using the correct Bayesian probability. Any other information is irrelevant. In another simple example, you can formulate this as a problem with an option to continue. In that case, there will be an instantaneous (also called flow) payoff and a continuation value (the value of being able to take the bet again). The continuation value depends on the posterior probability which, as you correctly mention, depends on other stuff. However, this continuation value only matters for the decision if it is affected by the decision. If the shady guy will nonetheless toss the coin, then how the posterior probability will change is irrelevant for you.

More generally, dynamic problems with new information are not a problem for Bayesians. Specifying the informational context of a problem requires a proper prior, which is a joint distribution of all variables. These variables can be the decision-relevant ones (the particulars of the coin) or informational ones (the history of coin tosses or extra information on how the coin was obtained). Bayes theorem has us update this prior in the usual way. While there are some examples where this extra information can be neatly summarized into a simple sufficient statistic (e.g., the number of tosses and number of heads for coins with a given probability of landing heads and independently distributed outcomes given the coin), those examples are the exception.

To recap, Bayesians are not "making incorrect technical arguments insinuating that the estimated probability alone should be enough for everyone." They are making correct arguments that fail only in a very small subset of problems, those with information acquisition that is affected by the decisions. In this way, it is not a "a valid and important question to ask." Furthermore, it is not clear that "Bayesians ought to get a habit of volunteering this information unprompted" because this information, besides being irrelevant to most decisions, is not easy to communicate succinctly.

They are making correct arguments that fail only in a very small subset of problems, those with information acquisition that is affected by the decisions.

I disagree that this is a very small subset of problems, the majority of real life problems let you decide to wait and collect more information or decide how many resources you're willing to bet. See examples in https://en.wikipedia.org/wiki/Multi-armed_bandit

For example, I think I first noticed this problem many years ago in one of Scott's linkdumps where he disapprovingly linked to Obama saying that CIA told him that such and such thing had a 70% probability but really they had no good information so it was a coinflip. And Scott was indignant, 70% is 70% what more do you need to know before you authorize some military operation, even the President doesn't understand probability smdh. In my opinion Obama was right, if technically imprecise, while Scott was wrong, which demonstrates the danger of having a little knowledge and also the need for more awareness.

is not easy to communicate succinctly.

You say this as if it's not Bayesians' fault that they have not developed (or got into a habit of using) a succinct way of conveying how much of the estimate comes from the prior and how much from the previous updates. I would understand if it was an acknowledged hard problem in need of community's attention, but for example Yudkowsky sequences don't mention it at all.

However in the real world there's almost always an option to wait and collect more data, and whether you want to exercise it critically depends on the difference between "it's a 50/50 chance based on observing 100 coinflips" and "it's a 50/50 chance based solely on the prior I pulled out of my ass".

This also ties to the longstanding discussion regarding calibration confidence of 50/50 predictions. One problem with 50/50 prediction of binary event (as in the post) is that it is equivalent language. If you say that you predict "50% chance of tails" is literally the same thing as you saying "I predict 50% of heads" because it is literally the part of the same observation of "I predict 50% chance of heads and 50% chance of tails" that accounts for everything.

This is also well known weakness, you can really pad your prediction capabilities by adding many 50/50 predictions which you phrase as binary - such as that bitcoin will have value greater than X before January 1st 2025 (Yes/No) or that you will get married etc. Just formulate 1,000 such independent scenarios and literally flip a coin to assign yes/no answers and you should do well.

I am not a paying subscriber so I cannot access the post in question to check if my objection is addressed. I think there's a simpler problem than what you've articulated here. Consider two statements: "I think X occurs with probability 50%" and "I think X is equally likely to have any probability [0..100]". There is a sense in which both statements are "the same" because the expected probability of a uniform distribution over [0..100] is 50 but the statements (to me) clearly convey different information. Sure, if you're forced to give a particular integer value for a statement's probability you would choose "50" in both cases, but there is clearly a distinction between the subjective states that lead to that same probability. The assertion that you should use 50% feels like it is an attempt to treat these two statements as equivalent when they aren't.

This is basically Bayesian-vs-frequentist. I think the counterargument would be "the statement that X is likely to have a probability isn't even coherent, that's a type error". You can say that a class of events has an objectively true rate of occurrence, ie. if a coin will be thrown 100 times, then there will be a factual number of heads that show up, but you cannot say that any individual cointhrow has a likelihood of having a likelihood - that's just a simple likelihood. In other words, you can assign 10% probability to a model of the coin in which it has a 60% probability of landing on heads, but the word "probability" there carried two different meanings: observational credence (subjective) vs outcome ratio (objective). You can't have a credence over a credence; one is observational, the other is physical.

Not sure if that makes sense.

Rephrase my second statement slightly. "I have no bias towards any number [0..100] as the probability for X." Does that convey the same information as "I think X occurs with probability 50%?"

Yes, but it's near impossible to genuinely have no bias about X; to have absolutely no bias X has to be decoupled from any causal modeling. We have bias for almost anything that happens in the world, so I think this just makes for bad intuition because it's such a cornercase.

Sure. I don't intend to make any particular claim about how often one is actually in the described state. My point is that Scott is wrong when he says you should say something happens with probability 50% if one finds themselves in the described state.

Why couldn't you just nest them? If I have a lottery ticket that pays off in other lottery tickets which finally pay money, then there are likelihoods of having a likelihood. You could of course calculate the average likelihood but sometimes this information is useful. Another example, if I have a game-theory situation where one player has beliefs over the beliefs of the other player, I have probabilities over probabilities.

AIUI technically speaking you have conditional probabilities, but that's not quite a "likelihood of having a likelihood" but "a likelihood given a precondition event which also has a likelihood".

I agree with this. I think this relates to another interesting problem in probability. "What's the probability that the 10^10^10 th prime is 3 mod 4?" Its tempting to say 1/2 since we know that the asymptotic density is 1/2 and we have no way of knowing. But this is iconsistent with the axioms of probability theory. Since it's a statement with a definite answer the probability has to be either 1 or 0 to be consistent.

Even this is operating a critical assumption that the probable outcome must be the true outcome. What if it isn't?

I have no idea what you could possibly mean. True statements have probability 1, that's axiomatic.

True statements have probability 1, that's axiomatic.

Yes, and therein lies the fundamental contradiction/weakness of Bayesian reasoning. A cursory examination of the world around us will show that improbable things happen all the time and thus one must conclude that the probability of improbable things occurring is 1.

Improbable events do not happen every time an improbable event could happen, so the probability of something improbable happening in a particular instance is not 1.

The probability that "something improbable will happen today somewhere in the world" is 1-epsilon, but that's correct.