site banner

Pay no attention to the Model Behind the Curtain!

link.springer.com

Many widely used models amount to an elaborate means of making up numbers—but once a number has been produced, it tends to be taken seriously and its source (the model) is rarely examined carefully. Many widely used models have little connection to the real-world phenomena they purport to explain. Common steps in modeling to support policy decisions, such as putting disparate things on the same scale, may conflict with reality. Not all costs and benefits can be put on the same scale, not all uncertainties can be expressed as probabilities, and not all model parameters measure what they purport to measure. These ideas are illustrated with examples from seismology, wind-turbine bird deaths, soccer penalty cards, gender bias in academia, and climate policy.

8
Jump in the discussion.

No email address required.

To quantify costs in a cost–benefit analysis, in effect you must assign a dollar value to human life, including future generations; to environmental degradation; to human culture; to endangered species; and so on. You must believe that scales like “quality adjusted life-years” or “utility” are meaningful, reasonable, and a sound basis for decisions

I'm not sure what this has to do with models, but I don't really like this section. I actually agree that there's a lot of abuse of quantification (combining unlike things or scales, performing operations on values that treat them as cardinal even though they're really ordinal, etc.) and that qualitative analysis can be very valuable. But any policy requires you to compare things that people are, often, hesitant to put numerical or dollar values on. Saying you can't quantify e.g. the value of human culture or endangered species doesn't change the fact that you're going to be faced with proposals to spend $30 million on helping victims of wildfire damage in California, $25 million on food aid for Nigeria, $100 million to invest in carbon-capture technology research, $50 million to buy local art for a park, and $17 million to save the wide-tailed blubberfish from extinction, as well as questions of how to set the property tax rate vs the sales tax rate, whether to require barbers be licensed (trading off very different benefits on each side), and you have to have some way of deciding which of those proposals make sense to support. Refusing to give a quantitative reason for your decision doesn't change the fact that each decision implicitly places a dollar value on each of those things.

There are many phenomena for which the frequency theory makes sense (e.g., games of chance where the mechanism of randomization is known and understood) and many for which it does not. What is the probability that global average temperature will increase by three degrees in the next 50 years? What is the chance there will be an earthquake with magnitude 8 or greater in the San Francisco Bay area in the next 50 years?Footnote12 Can we repeat the next 50 years over and over to see what fraction of the time that happens, even in principle?

There's no actual principled distinction here. The next flip of a coin may also be different from the last--the air currents in the room may have shifted, or the coin was damaged after the first toss. The question is "when are future events sufficiently similar to past ones to treat them as draws from some single empirical process?" which is basically the entire point of models, so this argument appears to be circular.

Also, I believe that Cox's theorem does, in fact, imply that all uncertainty is essentially probability. You do not know how to accurately calculate the probability is not the same as "probability is meaningless here" any more than the difficulty of solving the 3-body problem means that Newtonian mechanics doesn't apply.

That said, LeCam (1977, pp. 134–135) offers the following observations:

Without reading the reference, all of the claims in this section seem to be incoherent or just wrong.

Attempting to combine aleatory and epistemic uncertainties by considering both to be ‘probabilities’ that satisfy Kolmogorov’s axioms amounts to claiming that there are two equivalent ways to tell how much something weighs: I could weigh it on an actual physical scale or I could think hard about how much it weighs. The two are on a par. It claims that thinking hard about the question produces an unbiased measurement. Moreover, it claims that I know the accuracy of my internal ‘measurement’ from careful introspection. Hence, I can combine the two sources of uncertainty as if they were independent measurements of the same thing, both made by unbiased instruments.Footnote14

As far as I can tell, this is a strawman of Bayesianism, and misses the whole point of Bayesian updating. Moreover, I think the author is actually making the error they accuse others of making, just in reverse: Just like giving the same name doesn't make 2 things the same, giving them different names doesn't make them different. Combining different "kinds" of uncertainty--such as incorporating uncertainty in the distribution and the inherent randomness of the outcome into one probability estimate of an outcome--is actually quite easy.

The extended discussion of human biases is irrelevant.

This is just the law of total probability and the multiplication rule for conditional probabilities, but where is it coming from? That earthquakes occur at random is an assumption, not a matter of physics. Seismicity is complicated and unpredictable: haphazard, but not necessarily random. The standard argument to calibrate the PSHA fundamental relationship requires conflating rates with probabilities. For instance, suppose a magnitude eight event has been observed to occur about once a century in a given region. PSHA would assume that, therefore, the chance of a magnitude 8 event is 1% per year.

Distinguishing between a model that accounts for well-understood underlying causal processes, like a simulation of orbital mechanics, and a purely statistical model, is quite important. However, the insight of Bayesianism is that it does make sense to use the tools of randomness, regardless of the details of why an outcome is uncertain. Theoretically, a coin is subject to chaotic physical processes and you could predict its outcome perfectly with enough information. Its apparent randomness is entirely due to lacking that information (and the ability to process it and do physics on the data), which is fundamentally no different from earthquakes. The whole reason we're using purely statistical models for earthquakes, which have basically failed to produce any forecast which does better than "each year has fixed probability X of an earthquake of size Y", is because for earthquakes, we don't have the physical understanding or data.

In contrast, weather forecasts have become substantially more accurate over the past 50 years in large part because we can use models of underlying physical processes like fluid dynamics. All of this is well known. But these claims:

First, there is an epistemic leap from a rate to the existence of an underlying, stationary random process that generated the rate, as discussed above (see the quotation from Klemeš in particular). Second, it assumes that seismicity is uniform, which contradicts the observed clustering of seismicity in space and time. Third, it ignores the fact that the data at best give an estimate of a probability (if there is such a probability), not the exact value.

Are wrong. For his first point, this is not an assumption. Lots of people have attempted to predict earthquakes in a time-dependent way, and they've all failed miserably. Scientists fall back on the time-independent prediction because it's the only one that didn't prove to be completely wrong; he's just completely backwards on this point. For his second, there does sometimes appear to be clustering in some earthquake data, but it's not consistent and hasn't proven to be useful in making predictions. And the third is just irrelevant pedantry--who thinks of these estimates from historical data as being perfect? That these values have error bars doesn't make them useless, or make them not probabilities.

Three recent destructive earthquakes were in regions that seismic hazard maps said were relatively safe (Stein et al., 2012; see also Panza et al., 2014; Kossobokov et al., 2015). This should not be surprising, because PSHA is based on a metaphor, not on physics.

"Relatively" is doing a lot of heavy lifting. Sometimes 1-in-100 year events or 1-in-1000 year events happen, that's just what those statements mean. Places that rarely get earthquakes will still get them sometime, and if the process producing them is chaotic, then you might never get great predictions, even if you understand all of the underlying physics and have good data, just due to computational power limitations, like with orbital mechanics.

Refusing to give a quantitative reason for your decision doesn’t change the fact that each decision implicitly puts a dollar value on each of those things.

This just isn’t true. It might be the case that someone observing your decisions could impute some sort of indifference curves to you on that basis, but that is not at all the same as your actually having those indifference curves, nor as your valuing the items in terms of money. It may be that I have sufficient reasons for my decisions which make no reference whatsoever to cost and benefits, in which case it wouldn’t even make sense to ask the question about me. And it would be question-begging to simply assume that everyone must have some quantitative reasons for what they do.

Also, I believe that Cox’s theorem does, in fact, imply that all uncertainty is essentially probability.

Cox’s theorem, as stated on Wikipedia, assumes that you can assign a real number to the plausibility of every proposition. Why on earth would anyone believe that, much less that it reduces every uncertainty to a probability? Tell me, what real number should I assign to the plausibility of, “A dart thrown at the unit interval will land within the Vitali set?”

However, the insight of Bayesianism is that it does make sense to use the tools of randomness, regardless of why an outcome is uncertain.

How is that an “insight” of Bayesianism? It’s certainly an assumption of Bayesianism, but I don’t see how you could possibly prove it independently.

Sometimes 1-in-100 year events or 1-in-1000 year events happen, that’s just what those statements mean.

How do you distinguish that from the model being wrong? We don’t have 100 or 1000 years to wait and see whether the model performs properly out-of-sample or not. This defense could be used to exculpate any sort of model failing by just appealing to the Law of Large Numbers: “Well, everything with some chance has to happen eventually, so no actual event can be interpreted as decisively falsifying the model!”

It might be the case that someone observing your decisions could impute some sort of indifference curves to you on that basis, but that is not at all the same as your actually having those indifference curves, nor as your valuing the items in terms of money. It may be that I have sufficient reasons for my decisions which make no reference whatsoever to cost and benefits, in which case it wouldn’t even make sense to ask the question about me.

You cannot avoid the fact that you are going to end up comparing unlike things, because in life you face choices between dissimilar things. I didn't mention indifference curves at all, that's you putting words in my mouth. You don't need to invoke a full decision theory to make this point; it's enough to note that, for example, we spend money to reduce risk all the time, but we don't (and in fact, can't) spend infinite money to reduce risk (although some people try to do so when it isn't their money). This implies there is some amount of money you are willing to spend to save a life, and some amount you aren't (clearly simplifying here--real situations involve additional variables--but such complications only make the point more so, since they involve deciding between a wider variety of unlike things).

And it would be question-begging to simply assume that everyone must have some quantitative reasons for what they do.

I specifically noted that people avoid making explicit quantitative judgements. A lot of people also text while driving, but that doesn't mean it makes sense to do so.

“A dart thrown at the unit interval will land within the Vitali set?”

You mean an idealized dart? Like drawing from a uniform random variable on [0,1]? Formally, it's not defined, because probability is only defined on a sigma-algebra. This is irrelevant for all practical purposes; non-measurable sets require the AoC, and so won't be explicitly definable, and in any event you would never be able to tell if some specific number is in a Vitali-like set to know if your prediction was right or wrong.

How is that an “insight” of Bayesianism? It’s certainly an assumption of Bayesianism, but I don’t see how you could possibly prove it independently.

It's an insight because the laws of probability don't actually have distinctions like "aleatory" vs "epistemic." Bayes theorem is true regardless. These distinctions are, at best, useful in some situations to help solve specific problems, but they don't mean that probability doesn't apply.

How do you distinguish that from the model being wrong? We don’t have 100 or 1000 years to wait and see whether the model performs properly out-of-sample or not. This defense could be used to exculpate any sort of model failing by just appealing to the Law of Large Numbers: “Well, everything with some chance has to happen eventually, so no actual event can be interpreted as decisively falsifying the model!”

We actually do have some data on the historical occurrence of earthquakes. But also, you can combine data from different locations. If you have 100 locations where you think magnitude X earthquakes occur about once per 1,000 years, and the assumptions about independence in time and space hold, then you should expect to see about 10 magnitude X earthquakes per century across those locations.

In general "we don't have a lot of data on rare events" can be a problem, but it's a problem which again is well-known and has nothing to do with the rest of the paper. Certainly this statement in the original paper:

Three recent destructive earthquakes were in regions that seismic hazard maps said were relatively safe

is devoid of content. It's like going to a list of safe countries and then digging through the news to find a recent murder in one of them. What does that tell you? Absolutely nothing.

This implies that there is some amount of money that you are willing to spend to save a life, and some amount you aren’t

But that need not be because of how much money it is, but because money is fungible to other projects that we care about. E.g. maybe saving lives is infinitely valuable, but spending more than a million dollars to save one would stop you from saving two lives at 500k each. Then it would be rational to refuse to spend more than $1 million to save a life, but not because there’s some absolute sense in which a life is worth “only” a million dollars. Money is a numeraire between values, not a genuine value itself.

Formally, it’s not defined, because probability is only defined on a sigma-algebra.

Then it’s not true that every proposition can be assigned a real-valued degree of plausibility (unless you want to put all of your probability mass on a countable set, as this paper discusses). Either it would have to be imprecise or undefined.

Of course, nobody in practice could have a sigma-algebra of uncountably-many sets of real numbers in their head either, nor can anyone mentally represent probabilities to infinite real-valued precision. Orthodox Bayesianism also takes on these entirely impractical assumptions. The Vitali set was simply an example, but you could just as well use a non-measurable set to represent an agent’s imprecise credence in a prosaic proposition about which they have genuine ignorance, rather than mere uncertainty. I don’t see what’s problematic about that.

But that need not be because of how much money it is, but because money is fungible to other projects that we care about. E.g. maybe saving lives is infinitely valuable

People often declare that lives are infinitely valuable and proceed to spend money on entertainment rather than donating it to save lives. Nearly noone actually behaves like lives were infinitely valuable.

In practice, yes. It was just an example.