site banner

Pay no attention to the Model Behind the Curtain!

link.springer.com

Many widely used models amount to an elaborate means of making up numbers—but once a number has been produced, it tends to be taken seriously and its source (the model) is rarely examined carefully. Many widely used models have little connection to the real-world phenomena they purport to explain. Common steps in modeling to support policy decisions, such as putting disparate things on the same scale, may conflict with reality. Not all costs and benefits can be put on the same scale, not all uncertainties can be expressed as probabilities, and not all model parameters measure what they purport to measure. These ideas are illustrated with examples from seismology, wind-turbine bird deaths, soccer penalty cards, gender bias in academia, and climate policy.

8
Jump in the discussion.

No email address required.

If the point is merely that you shouldn't blindly take numbers from models at face value, that point is well taken. But, for all his criticism of models, Stark doesn't really give any alternatives. Ultimately, we as people and civilizations have number-related decisions to make and those numbers/decisions have to come from somewhere.

Anyway, more generally, I have a hard time taking someone's abstract arguments seriously, when their concrete arguments are bad. For example, Stark refutes the idea of using expected value, saying

There is evidence that human preference orderings are not based on probability times consequences, i.e., on expected returns or expected losses. For instance, there is a preference for “sure things” over bets. Many people would prefer to receive $1 million for sure than to have a 10% chance of receiving $20 million, even though the expected return of the latter is double

This is correct, but is solved by using an increasing, concave function to convert wealth to utility - an idea that is is extremely old, and is almost always used in actual economic literature. See, for instance, virtually anything written on investment in the last 50 years.

He then says

In a repeated game, basing choices on expectations might make sense, but in a single play, other considerations may dominate

But, again, the traditional academic approach is to maximize log-wealth in iterated games, since this strategy (with probability approaching 100%) out-performs naive linear wealth optimization.

These aren't obscure ideas he's ignoring - they're ideas you'd expect most economics undergraduate to encounter, let alone a professor. Why isn't he mentioning them? Lack of knowledge? Dishonesty? Thinking the counter-arguments are obvious? I don't know, but I do know those paragraphs burn a lot of credibility in my eyes.

Wait, why does log-wealth work, in particular? I’m having a hard time seeing how that stacks up vs. linear. Does it just capture that concave function in the limit?

Err... re-reading my comment, that second part is probably a tad unfair to Stark (though I stand by the first quote being a pure weak-man).

If you're still interested (and don't already know), the relevant search term is "Kelly Criterion". It is mostly limited to betting/investing and the central insight is that

  1. Since returns scale linearly with amount invested, the amount of money you have after a large number of sequential bets will be the product of the returns.

  2. log(products) = sum

  3. Therefore, by the Central Limit Theorem, your log(wealth) will converge to the log-expected return.

So, in the long-run, with near-certainty, optimizing for log-wealth actually optimizes for actual wealth, or any other function of wealth.

To quantify costs in a cost–benefit analysis, in effect you must assign a dollar value to human life, including future generations; to environmental degradation; to human culture; to endangered species; and so on. You must believe that scales like “quality adjusted life-years” or “utility” are meaningful, reasonable, and a sound basis for decisions

I'm not sure what this has to do with models, but I don't really like this section. I actually agree that there's a lot of abuse of quantification (combining unlike things or scales, performing operations on values that treat them as cardinal even though they're really ordinal, etc.) and that qualitative analysis can be very valuable. But any policy requires you to compare things that people are, often, hesitant to put numerical or dollar values on. Saying you can't quantify e.g. the value of human culture or endangered species doesn't change the fact that you're going to be faced with proposals to spend $30 million on helping victims of wildfire damage in California, $25 million on food aid for Nigeria, $100 million to invest in carbon-capture technology research, $50 million to buy local art for a park, and $17 million to save the wide-tailed blubberfish from extinction, as well as questions of how to set the property tax rate vs the sales tax rate, whether to require barbers be licensed (trading off very different benefits on each side), and you have to have some way of deciding which of those proposals make sense to support. Refusing to give a quantitative reason for your decision doesn't change the fact that each decision implicitly places a dollar value on each of those things.

There are many phenomena for which the frequency theory makes sense (e.g., games of chance where the mechanism of randomization is known and understood) and many for which it does not. What is the probability that global average temperature will increase by three degrees in the next 50 years? What is the chance there will be an earthquake with magnitude 8 or greater in the San Francisco Bay area in the next 50 years?Footnote12 Can we repeat the next 50 years over and over to see what fraction of the time that happens, even in principle?

There's no actual principled distinction here. The next flip of a coin may also be different from the last--the air currents in the room may have shifted, or the coin was damaged after the first toss. The question is "when are future events sufficiently similar to past ones to treat them as draws from some single empirical process?" which is basically the entire point of models, so this argument appears to be circular.

Also, I believe that Cox's theorem does, in fact, imply that all uncertainty is essentially probability. You do not know how to accurately calculate the probability is not the same as "probability is meaningless here" any more than the difficulty of solving the 3-body problem means that Newtonian mechanics doesn't apply.

That said, LeCam (1977, pp. 134–135) offers the following observations:

Without reading the reference, all of the claims in this section seem to be incoherent or just wrong.

Attempting to combine aleatory and epistemic uncertainties by considering both to be ‘probabilities’ that satisfy Kolmogorov’s axioms amounts to claiming that there are two equivalent ways to tell how much something weighs: I could weigh it on an actual physical scale or I could think hard about how much it weighs. The two are on a par. It claims that thinking hard about the question produces an unbiased measurement. Moreover, it claims that I know the accuracy of my internal ‘measurement’ from careful introspection. Hence, I can combine the two sources of uncertainty as if they were independent measurements of the same thing, both made by unbiased instruments.Footnote14

As far as I can tell, this is a strawman of Bayesianism, and misses the whole point of Bayesian updating. Moreover, I think the author is actually making the error they accuse others of making, just in reverse: Just like giving the same name doesn't make 2 things the same, giving them different names doesn't make them different. Combining different "kinds" of uncertainty--such as incorporating uncertainty in the distribution and the inherent randomness of the outcome into one probability estimate of an outcome--is actually quite easy.

The extended discussion of human biases is irrelevant.

This is just the law of total probability and the multiplication rule for conditional probabilities, but where is it coming from? That earthquakes occur at random is an assumption, not a matter of physics. Seismicity is complicated and unpredictable: haphazard, but not necessarily random. The standard argument to calibrate the PSHA fundamental relationship requires conflating rates with probabilities. For instance, suppose a magnitude eight event has been observed to occur about once a century in a given region. PSHA would assume that, therefore, the chance of a magnitude 8 event is 1% per year.

Distinguishing between a model that accounts for well-understood underlying causal processes, like a simulation of orbital mechanics, and a purely statistical model, is quite important. However, the insight of Bayesianism is that it does make sense to use the tools of randomness, regardless of the details of why an outcome is uncertain. Theoretically, a coin is subject to chaotic physical processes and you could predict its outcome perfectly with enough information. Its apparent randomness is entirely due to lacking that information (and the ability to process it and do physics on the data), which is fundamentally no different from earthquakes. The whole reason we're using purely statistical models for earthquakes, which have basically failed to produce any forecast which does better than "each year has fixed probability X of an earthquake of size Y", is because for earthquakes, we don't have the physical understanding or data.

In contrast, weather forecasts have become substantially more accurate over the past 50 years in large part because we can use models of underlying physical processes like fluid dynamics. All of this is well known. But these claims:

First, there is an epistemic leap from a rate to the existence of an underlying, stationary random process that generated the rate, as discussed above (see the quotation from Klemeš in particular). Second, it assumes that seismicity is uniform, which contradicts the observed clustering of seismicity in space and time. Third, it ignores the fact that the data at best give an estimate of a probability (if there is such a probability), not the exact value.

Are wrong. For his first point, this is not an assumption. Lots of people have attempted to predict earthquakes in a time-dependent way, and they've all failed miserably. Scientists fall back on the time-independent prediction because it's the only one that didn't prove to be completely wrong; he's just completely backwards on this point. For his second, there does sometimes appear to be clustering in some earthquake data, but it's not consistent and hasn't proven to be useful in making predictions. And the third is just irrelevant pedantry--who thinks of these estimates from historical data as being perfect? That these values have error bars doesn't make them useless, or make them not probabilities.

Three recent destructive earthquakes were in regions that seismic hazard maps said were relatively safe (Stein et al., 2012; see also Panza et al., 2014; Kossobokov et al., 2015). This should not be surprising, because PSHA is based on a metaphor, not on physics.

"Relatively" is doing a lot of heavy lifting. Sometimes 1-in-100 year events or 1-in-1000 year events happen, that's just what those statements mean. Places that rarely get earthquakes will still get them sometime, and if the process producing them is chaotic, then you might never get great predictions, even if you understand all of the underlying physics and have good data, just due to computational power limitations, like with orbital mechanics.

Refusing to give a quantitative reason for your decision doesn’t change the fact that each decision implicitly puts a dollar value on each of those things.

This just isn’t true. It might be the case that someone observing your decisions could impute some sort of indifference curves to you on that basis, but that is not at all the same as your actually having those indifference curves, nor as your valuing the items in terms of money. It may be that I have sufficient reasons for my decisions which make no reference whatsoever to cost and benefits, in which case it wouldn’t even make sense to ask the question about me. And it would be question-begging to simply assume that everyone must have some quantitative reasons for what they do.

Also, I believe that Cox’s theorem does, in fact, imply that all uncertainty is essentially probability.

Cox’s theorem, as stated on Wikipedia, assumes that you can assign a real number to the plausibility of every proposition. Why on earth would anyone believe that, much less that it reduces every uncertainty to a probability? Tell me, what real number should I assign to the plausibility of, “A dart thrown at the unit interval will land within the Vitali set?”

However, the insight of Bayesianism is that it does make sense to use the tools of randomness, regardless of why an outcome is uncertain.

How is that an “insight” of Bayesianism? It’s certainly an assumption of Bayesianism, but I don’t see how you could possibly prove it independently.

Sometimes 1-in-100 year events or 1-in-1000 year events happen, that’s just what those statements mean.

How do you distinguish that from the model being wrong? We don’t have 100 or 1000 years to wait and see whether the model performs properly out-of-sample or not. This defense could be used to exculpate any sort of model failing by just appealing to the Law of Large Numbers: “Well, everything with some chance has to happen eventually, so no actual event can be interpreted as decisively falsifying the model!”

It might be the case that someone observing your decisions could impute some sort of indifference curves to you on that basis, but that is not at all the same as your actually having those indifference curves, nor as your valuing the items in terms of money. It may be that I have sufficient reasons for my decisions which make no reference whatsoever to cost and benefits, in which case it wouldn’t even make sense to ask the question about me.

You cannot avoid the fact that you are going to end up comparing unlike things, because in life you face choices between dissimilar things. I didn't mention indifference curves at all, that's you putting words in my mouth. You don't need to invoke a full decision theory to make this point; it's enough to note that, for example, we spend money to reduce risk all the time, but we don't (and in fact, can't) spend infinite money to reduce risk (although some people try to do so when it isn't their money). This implies there is some amount of money you are willing to spend to save a life, and some amount you aren't (clearly simplifying here--real situations involve additional variables--but such complications only make the point more so, since they involve deciding between a wider variety of unlike things).

And it would be question-begging to simply assume that everyone must have some quantitative reasons for what they do.

I specifically noted that people avoid making explicit quantitative judgements. A lot of people also text while driving, but that doesn't mean it makes sense to do so.

“A dart thrown at the unit interval will land within the Vitali set?”

You mean an idealized dart? Like drawing from a uniform random variable on [0,1]? Formally, it's not defined, because probability is only defined on a sigma-algebra. This is irrelevant for all practical purposes; non-measurable sets require the AoC, and so won't be explicitly definable, and in any event you would never be able to tell if some specific number is in a Vitali-like set to know if your prediction was right or wrong.

How is that an “insight” of Bayesianism? It’s certainly an assumption of Bayesianism, but I don’t see how you could possibly prove it independently.

It's an insight because the laws of probability don't actually have distinctions like "aleatory" vs "epistemic." Bayes theorem is true regardless. These distinctions are, at best, useful in some situations to help solve specific problems, but they don't mean that probability doesn't apply.

How do you distinguish that from the model being wrong? We don’t have 100 or 1000 years to wait and see whether the model performs properly out-of-sample or not. This defense could be used to exculpate any sort of model failing by just appealing to the Law of Large Numbers: “Well, everything with some chance has to happen eventually, so no actual event can be interpreted as decisively falsifying the model!”

We actually do have some data on the historical occurrence of earthquakes. But also, you can combine data from different locations. If you have 100 locations where you think magnitude X earthquakes occur about once per 1,000 years, and the assumptions about independence in time and space hold, then you should expect to see about 10 magnitude X earthquakes per century across those locations.

In general "we don't have a lot of data on rare events" can be a problem, but it's a problem which again is well-known and has nothing to do with the rest of the paper. Certainly this statement in the original paper:

Three recent destructive earthquakes were in regions that seismic hazard maps said were relatively safe

is devoid of content. It's like going to a list of safe countries and then digging through the news to find a recent murder in one of them. What does that tell you? Absolutely nothing.

This implies that there is some amount of money that you are willing to spend to save a life, and some amount you aren’t

But that need not be because of how much money it is, but because money is fungible to other projects that we care about. E.g. maybe saving lives is infinitely valuable, but spending more than a million dollars to save one would stop you from saving two lives at 500k each. Then it would be rational to refuse to spend more than $1 million to save a life, but not because there’s some absolute sense in which a life is worth “only” a million dollars. Money is a numeraire between values, not a genuine value itself.

Formally, it’s not defined, because probability is only defined on a sigma-algebra.

Then it’s not true that every proposition can be assigned a real-valued degree of plausibility (unless you want to put all of your probability mass on a countable set, as this paper discusses). Either it would have to be imprecise or undefined.

Of course, nobody in practice could have a sigma-algebra of uncountably-many sets of real numbers in their head either, nor can anyone mentally represent probabilities to infinite real-valued precision. Orthodox Bayesianism also takes on these entirely impractical assumptions. The Vitali set was simply an example, but you could just as well use a non-measurable set to represent an agent’s imprecise credence in a prosaic proposition about which they have genuine ignorance, rather than mere uncertainty. I don’t see what’s problematic about that.

But that need not be because of how much money it is, but because money is fungible to other projects that we care about. E.g. maybe saving lives is infinitely valuable

People often declare that lives are infinitely valuable and proceed to spend money on entertainment rather than donating it to save lives. Nearly noone actually behaves like lives were infinitely valuable.

In practice, yes. It was just an example.

But that need not be because of how much money it is, but because money is fungible to other projects that we care about

Yes, obviously, that is the point. You are still comparing a human life to that other project, whose benefits may be in lives saved, but it also might not be, and you're sometimes going to be faced with such a decision.

maybe saving lives is infinitely valuable, but spending more than a million dollars to save one would stop you from saving two lives at 500k each

Until you more precisely define what you mean by "infinitely valuable," this statement is meaningless because 2 infinite things may be identical to 1. But also, in practice, literally no one's behavior reflects such a claim, and since you can't have everything be infinitely valuable, you would still be faced with many other "noncomparable" decisions, like, to use the paper's examples, culture and the environment.

(unless you want to put all of your probability mass on a countable set, as this paper discusses

Is there any instance in which doing so would be empirically distinguishable from a truly continuous distribution or outcome space? We can only make measurements that have rational values, for example. Using real numbers is a often very very good approximation to something that is actually discrete (like molecules in a fluid) and that avoids even more tedious less-than-symbol chasing, but isn't necessary. And if you don't take the axiom of choice, the response to "what about non-measurable sets?" is "What are you talking about? Should I also consider what happens when 1+1 is 3?"

Moreover, that paper says:

Propositions need not get a probability, but may instead be assigned a range of probabilities.

which, as far as I can tell, hasn't actually avoided the alleged problem.

The Vitali set was simply an example, but you could just as well use a non-measurable set to represent an agent’s imprecise credence in a prosaic proposition about which they have genuine ignorance, rather than mere uncertainty.

There are models of ZF (with a weaker version of choice, even) in which all subsets of R are Lebesgue measurable. If you want I suppose you could develop an epistemology where some propositions have undefined probability, but if I choose not to use choice, are you going to say that doing so must be wrong, because my model of the world contains no such sets? After all, the original paper is claiming that there definitely are hypotheses to which probability cannot be applied at all.

I don’t see what’s problematic about that.

Well, for one, if it has an undefined prior probability, how do you do any sort of update based on evidence? If you receive some information on it, how would you know how strongly it supports your hypothesis compared to others, such as "the measurement device was flawed"? But again, even if you would like to think this way, it doesn't mean that an alternative is wrong.

Until you more precisely define what you mean by "infinitely valuable," this statement is meaningless because 2 infinite things may be identical to 1.

As in, more valuable than any consideration not to do with saving lives.

But also, in practice, literally no one's behavior reflects such a claim, and since you can't have everything be infinitely valuable, you would still be faced with many other "noncomparable" decisions, like, to use the paper's examples, culture and the environment.

I don't think your empirical claim is true. And I don't know what you mean by "non-comparable." If one factor gets absolute priority, then nothing is non-comparable.

Is there any instance in which doing so would be empirically distinguishable from a truly continuous distribution or outcome space?

It wouldn't reflect orthodox Bayesianism, that's for sure. Also it would make your probability measure no longer translation invariant or dilation-linear, which is bad. Now it seems like you're falling back to naïve operationalism instead of actually defending Bayesianism. And if you're only concerned with is some pre-theoretical conception of what's "empirically distinguishable," then why bother with infinities at all instead of just sticking to finite sigma-algebras? No one ever actually observes countable infinities of events either.

which, as far as I can tell, hasn't actually avoided the alleged problem.

Except they explicitly deny that that's an adequate solution in the paper: "Although we argue for imprecise credences, in §6 we argue against the standard interpretation of imprecise credences as sets of precise probabilities."

And if you don't take the axiom of choice, the response to "what about non-measurable sets?" is "What are you talking about? Should I also consider what happens when 1+1 is 3?"

Only if you have literally 0 credence that the axiom of choice might be true. Arguably even that's not enough, you need to not include any propositions that involve that axiom in your sigma-algebra (in which case Cox's theorem fails again), because many prominent Bayesians (e.g. Al Hajek) think Bayesianism should be extended with primitive conditional probabilities to allow conditionalization on any non-impossible proposition. Otherwise, you're still going to have to deal with questions like, "If the Axiom of Choice were true, then what would be the probability that a fair dart on the unit interval hits an element of the Vitali set"?

If you want I suppose you could develop an epistemology where some propositions have undefined probability, but if I choose not to use choice, are you going to say that doing so must be wrong, because my model of the world contains no such sets?

I think that it would be a defective model because, as discussed above, it would be incapable of even contemplating alternatives within its own framework. But even putting that aside, that would be a substantial retreat from your original claim that Cox's theorem proves that all uncertainty is be reducible to (precise) probability.

After all, the original paper is claiming that there definitely are hypotheses to which probability cannot be applied at all.

And you said that there definitely aren't, so you made an equally strong claim, which is what I took issue with, not the idea that there's no coherent model where that's true.

Well, for one, if it has an undefined prior probability, how do you do any sort of update based on evidence?

But I'm not saying it would be undefined, just that it would be imprecise.

But again, even if you would like to think this way, it doesn't mean that an alternative is wrong.

See above.

I don't think your empirical claim is true.

No one that I'm aware of has given up all non-essential consumption, and attempted to force others to do so, in order to save more lives.

And I don't know what you mean by "non-comparable." If one factor gets absolute priority, then nothing is non-comparable.

What you said doesn't allow you to compare art and the environment. I already gave this example, so I don't know why you're so confused.

It wouldn't reflect orthodox Bayesianism, that's for sure. Now it seems like you're falling back to naïve operationalism instead of actually defending Bayesianism.

What? This has nothing to do with Bayesianism. We can only measure things to finite accuracy, so we will never know if a result is exactly pi meters long. And the universe itself may be discrete, so that concept may not even make sense. Similarly, we can never explicitly describe all of the possible outcomes in an uncountable set. You could use rational numbers for all of the relevant math we do, it would just be harder for no real improvement.

But I'm not saying it would be undefined, just that it would be imprecise.

So what does this have to do with the paper being discussed?

"If the Axiom of Choice were true, then what would be the probability that a fair dart on the unit interval hits an element of the Vitali set"?

Asking if an axiom of mathematics is true is a nonsense question. We have different systems of mathematics, with different sets of axioms. As long as your system is consistent, it is not any more or less "true" than a different system that is also consistent. What you could ask is something like "within ZFC, what is P(X in V)?" (where X ~ U(0,1) and V is the Vitali set). And this is not a real number. But the original paper makes no such argument--it just asserts that probability doesn't always apply for reasons that are entirely unrelated to this argument. It certainly never mentions that you must assume the axiom of choice for this argument; given the author is a statistician, I would be surprised if he knew any of this. This discussion is also unrelated to science: Such sets are never going to be relevant in practice, and even if you assume the AoC, you can never explicitly define any non-measurable sets.

Most people believe in deontological constraints in addition to value maximization, so even if they thought that saving lives were infinitely valuable, they wouldn’t necessarily force others to try to do so. Cf. Christians who believe saving souls is infinitely valuable, but don’t try to force everyone to convert because they think there are deontological constraints against forced conversion. And plenty of people have been willing to sacrifice all unnecessary consumption and force others to do so to save lives, see e.g. lots of climate fanatics.

Making decisions about which thing to prioritize doesn’t require them to be comparable. There is a vast philosophical literature about decision theory involving incompatibilities, exactly none of which affirms that we can never rationally choose one incomparable thing over the other.

Whether or not we can measure things to arbitrary precision, that has little to no bearing on whether our probabilities should be arbitrarily precise, because probabilities are not all about empirical measurements. What is the probability of a fair dart thrown at [0,pi] landing within [0,1]? Hey, it’s exactly 1/pi, no arbitrarily-precise calculations necessary. Lots of probabilities are a priori and hence independent of empirical measurements.

It has to do with the paper because where do they say that the inapplicability of precise probabilities requires that the probability be undefined rather than precise? It seems obvious they’re talking specifically about precise probabilities, so clearly undefined probabilities are not the sole relevant alternative here.

What you are asserting about mathematical truth is a highly contested position in the philosophy of mathematics, and it’s not clear to me that it’s even coherent. If by “true” you literally mean “true,” then it can’t be coherent because then you’d have axiom systems with contradictory axioms which would both be true. If you don’t mean “true” literally then I don’t know what you mean by it.

I am not exclusively defending the original paper on its own terms, I just think your arguments against some of its conclusions rest on unsound premises for (at least partially) independent reasons. And my main issue is with your original implication that the conclusions you criticized are just obviously wrong. As we’ve seen in the course of this discussion, the premises you assert against those conclusions are highly non-obvious themselves, whether they’re true or not.

Most people believe in deontological constraints in addition to value maximization, so even if they thought that saving lives were infinitely valuable, they wouldn’t necessarily force others to try to do so

Yes, exactly. "The value of a soul" and "deontological considerations about forced conversion" are wildly different things that they are comparing, just like the author of this paper would assert is "quantifauxcation."

Making decisions about which thing to prioritize doesn’t require them to be comparable. There is a vast philosophical literature about decision theory involving incompatibilities, exactly none of which affirms that we can never rationally choose one incomparable thing over the other.

This sounds like a quibble over definitions. I would consider any decision between X and Y to constitute a comparison between them, by the common definition of the word "compare." You don't have to agree with that definition, but it seems like you do agree that people regularly decide between 2 things that are extremely different from each other, just like it is totally valid to say something like "the average person would decides to take X dollars in exchange for an increase of Y to their risk of death."

Lots of probabilities are a priori and hence independent of empirical measurements.

I think you're making a very, very different argument than was made in the original paper. Which is fine, but it's not really relevant to my argument. The probability you gave is exactly 1/pi, yes. As far as I can tell, this is is unrelated to the claim that the use of probabilities for complex problems is inappropriate because you lack sufficient information to calculate a probability. As far as I know, no one says something like "the completely precise probability of conflict in the Korean peninsula this year is 5 + pi/50 percent." That's just a strawman.

What you are asserting about mathematical truth is a highly contested position in the philosophy of mathematics, and it’s not clear to me that it’s even coherent.

I don't think most logicians would tell you there's a definitive answer to the question, "Is the axiom of choice true?" Or, perhaps an even better example, the continuum hypothesis. But again, I don't think any of this is relevant to the claims being made in the paper. I guess you think that some of what I wrote isn't literally true, if interpreted in a different context than this thread?

More comments

Based on the abstract, this paper looks interesting, but do you have a bit more of a summary as to why it's in its own thread?

It seems like the kind of technical paper that people here would be interested in. It didn't seem particularly Culture War relevant so it didn't seem like the right fit for the megathread.