site banner

Is the Future (Somewhat) Predictable? A Case for Treating Forecasting as a Skill

One view I hold - one I know many people here will be skeptical of - is that the future is partially predictable in a systematic way. Not in a deterministic or oracular sense, but in the limited, Tetlock-style sense of assigning calibrated probabilities to uncertain events and doing so better than baseline forecasters over time.

I’ve spent roughly the last 15 years trying to formalize and stress-test my own forecasting process. During that period, I’ve made public, timestamped predictions about events such as COVID, the Ukraine war, and various market movements. Some of these forecasts were wrong, some were directionally correct, and many were correct with meaningful lead time. Taken together, I think they at least suggest that forecasting can be treated as a learnable, improvable skill rather than an exercise in narrative hindsight.

When I’ve raised versions of this argument in the past (including in The Motte’s earlier Reddit incarnation), I’ve consistently encountered a few objections. I think these objections reflect reasonable priors, so I want to address them explicitly.

1 - “If prediction is possible, why aren’t the experts already doing it?”

My claim is not that expertise is useless, but that many expert institutions are poorly optimized for predictive accuracy. Incentives matter. Academia, media, and policy organizations tend to reward coherence, confidence, and alignment with prevailing narratives more than calibration or long-term scoring.

One reason I became interested in forecasting is that I appear to have unusually strong priors and pattern-recognition ability by objective measures. I’ve scored in the top 1% on multiple standardized exams (SAT, SHSAT, GMAT) on first attempts, which at least suggests above-average ability to reason under uncertainty and time pressure. That doesn’t make me infallible, but it does affect my prior that this might be a domain where individual skill differences matter.

Tetlock’s work also suggests that elite forecasting performance correlates less with formal credentials and more with specific cognitive habits: base-rate awareness, decomposition, active updating, and comfort expressing uncertainty numerically. These traits are not especially rewarded by most expert pipelines, which may explain why high-status experts often underperform trained forecasters.

My suspicion - very much a hypothesis, not a conclusion - is that many people in communities like this one are already better forecasters than credentialed experts, even if they don’t label what they’re doing as forecasting.

2 - “If you can forecast, why not just make money in markets?”

This is a fair question, since markets are one of the few environments where forecasts are continuously scored.

I have used forecasting methods in investing. Over the past five years, my average annual return has been approximately 40%, substantially outperforming major indices and comparable to or better than many elite hedge funds over the same period. This is net of mistakes, drawdowns, and revisions—not a cherry-picked subset.

That said, markets are noisy, capital-constrained, and adversarial. Forecasting ability helps, but translating probabilistic beliefs into portfolio construction, position sizing, and risk management is its own discipline. Forecasting is a necessary input, not a sufficient condition for success.

More importantly, I don’t think markets are the only - or even the most interesting - application. Forecasting is at least as relevant to geopolitics, institutional risk, public health, and personal decision-making, where feedback is slower but the stakes are often higher.

3 - “Where are the receipts?”

That’s a reasonable demand. I’ve tried to make at least some predictions public and timestamped so they can be evaluated ex ante rather than reconstructed after the fact.

Here are a few examples where I laid out forecasts and reasoning in advance:

https://questioner.substack.com/p/more-stock-advice

https://questioner.substack.com/p/superforecasting-for-dummies-9a5

I don’t claim these constitute definitive proof. At best, they are auditable data points that can be examined, criticized, or falsified.

What I’m Actually Interested in Discussing

I’m not asking anyone to defer to my forecasts, and I’m not claiming prediction is easy or universally applicable. What I am interested in is whether superforecasting should be treated as a legitimate applied discipline—and, if so:

Where does it work reliably, and where does it fail?

How should forecasting skill be evaluated outside of markets?

What selection effects or survivorship biases should we worry about?

Can forecasting methods be exploited or weaponized?

What institutional designs would actually reward calibration over narrative?

If your view is that forecasting success is mostly an artifact of hindsight bias or selective memory, I’d be genuinely interested in stress-testing that claim. Likewise, if you think forecasting works only in narrow domains, I’d like to understand where you’d draw those boundaries and why.

I’m less interested in persuading anyone than in subjecting the model itself to adversarial scrutiny. Looking forwards to hearing your thoughts.

5
Jump in the discussion.

No email address required.

I would be surprise if anyone here doesn’t believe that forecasting is possible. My gut says there would be nearly 100% agreement that psychohistory (Foundation Series) is most likely true.

I would assume 50% of posters here could generate 40% annual returns on modest amounts of capital if they focused on earning their living in trading. It’s not an impressive number especially when you made zero mention of your risks system.

Jane Street literally recruited straight out of SSC message boards. And they do like $25 billion a year in pnl now (far less when they began recruiting from SSC).

It’s not an impressive number especially when you made zero mention of your risks system.

Sorry about that. It's minimal risk: I never buy using leverage.

And 40% was an impressive enough figure that when I pitched it at a VC gathering, a Swiss asset management company approached me about potentially investing in my fund.

Risks isn’t necessarily just leverage. Something can be low risks and traded at 30x leverage (a 9 year US treasury versus short 10 year treasury arb) or can be high risks with no leverage (an oil exploration company). Owning SAAS since Jan 1 unlevered is down 25% YTD for example.

Am I reading that right, you reckon half of people here should be able to nearly 6x their savings every 5 years?

Maybe a little less. Shape rotator brain types are the prototypical quant/prop trading personality/ability. We do have some word cells here.

In my experience about 50% of the prototypical shape rotator works out in trading. People always cite 99% of retail traders fail. But the typical MIT 800 math SAT kid can figure out a way to make money in markets. A lot of the people on the Motte fit that background.

With $1m, 3-6 months of guidance, and interest in markets - I think a person like that could find a way to make $400k over 12 months. At that scale you can do things like stare at less liquid small cap stocks and make a market or trade lumber futures. Both areas where you have a lot less competition from established players. This would not be side employment and managing a brokerage account in your free time.

People always cite 99% of retail traders fail. But the typical MIT 800 math SAT kid can figure out a way to make money in markets.

In a thread about reference class forecasting, I find it curious that you can throw this out comment with a straight face.

If you have, on average, 1 person in a hundred who is capable of making money, even a poster here who is five times as likely to succeed as a normie is still likely to fail.

I have worked in prop trading. The background that Jane Street hirers - math kids seem to work out at a 50% rate.

It’s like the NBA. A 7’ kid has like a 35% chance of making the NBA while the average person is not distinguishable from zero. We have a lot of 7’ kids in rational community.

I'll defer to your expertise but I personally find it hard to believe that a combo of scepticism and comp sci skills turn a 1% hire into a 50% hire.

What do you see as the actual attributes that lend themselves to forecasting "lumber futures" or whatever? I'm guessing there's some major filtering going on in the hiring process. Like tetlock only submitting his Superforecaster's predictions or something.

I'm sure you could take the top 50 candidates from the motte and they'd do well. But I doubt you'd be getting those results from a random sampling. Probably you beat the 1%, but you don't turn it into a 50%.

I might be high on 50%. When I got into the business things were less automated and tech may have lowered the success rate. I have tried to correct for this by giving ample capital ($1m) and saying they can play less scalable and less competitive games (small caps/smaller commodities). Avoiding super scaled trades like SP500 products where Jump etc has automated.

I am sort of assuming the average Motte poster is basically SBF - MIT/Stanford type. We probably have a few but that is probably 1-2 tiers higher intellectual firepower.

I’m not proposing huge compensation. 50% on 1m is only 500k. So not significantly more money than the same person taking Faang job and getting 10% on 1m.

This has been a twitter discussion lately on X about different prop firms and classifying them as “Chicago” style or “MIT” style. Chicago style being quanty people who take more risks (sleep on positions) with more yolo and vibe based. MIT being more pure algo trading. For the purpose of this discussion I am solely proposing a Chicago style since MIT style requires a lot of infrastructure. https://x.com/annanay/status/2012164686943261175/photo/1

I have a friend who a few years ago swapped between being “Chicago” to being “MIT”. His Chicago days were more fun. His MIT days now are a more stable lifestyle. You do not sleep well holding positions for weeks, but programming algos is much more of just a well paid 9-5 jobs.

To not dox too much he’s from a MIT/Stanford type school. He spent 5-10 years trading let’s call it cheese. At a firm but mostly siloed so had to build out what he needed himself. Think he made $20m those years. Out of college he couldn’t get into the name brand firms but he did those trades and eventually transferred.

David Orr is an interesting twitter follow. Jewish former poker pro. Not sure he went to college. Obviously some quant ability if he made it in the poker world. Started fucking around in Japanese value stocks when poker died and he fafo’d.

A lot of name brand hedge funds/firms started all have similar backgrounds. Of which the Motte has a high overlap in backgrounds with that group. Ken Griffin first started trading convertible bonds in his Harvard dorm room. Don Wilson just started showing up at the board of trade (UChicago), Tepper was pro but go fired and just yolo’d personal account (Jew ), 3red trading smaller Chicago firm like a single pod (Jew, Chess, Northwestern), Jeff Yass (poker, Jew). List could go on.

Now I started listing Jew because a lot of the firms are Jewish and it especially captures the ones without elite education. It’s probably just capturing the over representing in nobels and other intellectual achievements. It’s the only category that sometimes doesn’t have elite education too. Eliezer Yudkowsky would be an outside of the industry guy without education.

I think there are certain factors you can identify that strongly signifies they would work out in trading. Evidence of being good at math logic, Jewish, competitive mind game ability Chess/Poker. The equivalent of being 7’ tall and being able to consider basketball as a legitimate career path. Maybe 50% is too high, but if you start checking a few of “all the successful people have this background” then it’s far higher than the 0% of retail traders make money. I think the Motte has a lot of the people who check these boxes. Even more before rationalism went from a niche to a few percent of the population.

If you gave Scott Alexander a million dollars to trade prediction markets full time my gut says 12 months from now his pnl would be ok.

I want to call Jane Street as always rational from my memory. Grok is telling me no. It’s telling me the first public promotion was “earn to give” by some employees in 2012.

I am more confident in the 50% success rate in the pre-2010 timeline. I think this sub is smart enough that a high percentage of the posters could make money in the pre-automation days. Floor trading days - yes. Early electronic days but before heavy automation - probably yes. Today maybe we could get to 50% deploying people in trading niche.

Interesting take. Cheers.

It seems like you think the motte has a major weighting towards that top 1% of prospective applicants, which is where you're getting your intuition from.

Frankly I find it hard to disagree. If i read 100 comments on reddit on a given issue, probably 99 of them are totally retarded and incapable of demonstrating rational thought. If I read 100 comments on the motte, every second one is at least somewhat sensible.

Where does it work reliably, and where does it fail?

I can reliably say that personal vehicles will be an important part of society in 2080 with a high degree of confidence. This is the type of question Tetlock/Kahneman's reference class forecasting does well to predict.

You also have the Nate Silver application of categorising and data analyses. So in a data rich environment (sports, weather, crime) you can generate very accurate assessments about the future.

But you also have the Nasim Taleb category of forecasting. E.g. if you can think to ask the question in the first place, it's probably not that useful. He empahsises the importance of the unknown unknowns and thinks we over invest in facts that are obvious about the future.

Said another way, we have known knowns (facts), known unknowns (intelligence gaps) and unknown unknowns (intelligence gaps we haven't identified yet). We can do a good job talking about the first two, and determine a base rate for whether a nuclear bomb is going to be detonated in the next 10 years. We do a very bad job at answering the questions we don't know we need to ask, for obvious reasons.

Tetlock has brought this up too. He says the next major research question is how to teach people to ask good questions to begin with. He doesn't use the intelligence gap terminology, but that's what professional intelligence analysts interpret this point as.

To put a point on this question you kind of need to understand the problem with identifying these intelligence questions to begin with. In 2019, probably a lot of experts on diseases and pandemics would have rated the possibility of a global outbreak quite highly in relative terms. By 2021 they would have known their predictions had been correct about the standing risk. But if I was opening a cafe in 2019, I would not have been able to even ask the question about a global pandemic which would kill my business in 2 years. I did not have the information required to know what i didn't know, and all of the business analysis I did was useless as I saw the city shut down society for months at a time.

The main gap in the reference class forecasting technique is that foundational problem. Knowing which questions are relevant to me in an environment of incomplete information.

Likewise, if you think forecasting works only in narrow domains, I’d like to understand where you’d draw those boundaries and why.

It's one of the best tools we have. I've worked in intelligence for 15 years and can consistently give intuitive reactions just by having reference class forecasting in the back of my mind. A question will be thrown at us in a meeting and I intuitively stabilise thr inside view against a baseline outside view. Most of the time, in most applications, the answer is to moderate the claims of people who are getting excited about all this new information coming in. You can do this with Iran today, you can do it with the ICE raids, you can do it with the likely GDP per annum.

But it is much harder to be pointed at extreme edge cases. Tetlock resists that his technique is better used in environments with a lot of history to form the baseline. But in practical applications, I've found this to be the case.

Again, I dont think this is too much of a flaw. The real problem is the initial questions. If there's something really important that's going to happen next year, but nobody has the method or ability to identify that there are quesrions about it that need to be asked, the technique is useless. Everybody can argue over the arrival date of general AI milestones. But it is incredibly difficult to identify black swans ahead of time. And I think most important, world changing things hit people more like cafe owners in 2019 who have no idea their whole business hinges on this lab in China not fucking up one day.

This is an excellent comment, and I largely agree with your taxonomy and framing. In particular, I think you’re exactly right that reference-class forecasting shines most when you have (a) stable baselines and (b) a well-posed question to begin with. Your distinction between known unknowns and unknown unknowns maps very cleanly onto where forecasting techniques feel powerful versus where they feel brittle in practice.

Your intelligence-analysis perspective also rings true to me. Using the outside view as a stabilizer against excited inside-view narratives is, in my experience, one of the highest-leverage applications of forecasting. In most real-world settings, the dominant failure mode isn’t underreaction but overreaction to new, salient information, and reference classes are a very effective corrective.

Where I’d push back slightly—and I mean this as a nuance rather than a rejection—is on COVID as an example of a true black swan in the Taleb sense.

I agree completely with your café-owner framing: for many individuals, COVID was effectively unaskable ex ante, and therefore indistinguishable from an unknown unknown. At the decision-maker level, it absolutely behaved like a black swan. That’s an important and underappreciated point.

However, at the system level, I’m less convinced it was unforeseeable. A number of people did, in fact, raise the specific risk in advance:

Bill Gates publicly warned in 2015 that global pandemic preparedness was dangerously inadequate and that a fast-moving virus was a more realistic threat than many conventional disaster scenarios.

The Wuhan Institute of Virology had been criticized multiple times prior to 2020 for operating at biosafety levels below what many thought appropriate for the research being conducted.

More broadly, pandemic risk had a nontrivial base rate in the epidemiology and biosecurity literature, even if the exact trigger and timing were unknown.

On a more personal note (and not meant as special pleading), I discussed viral and memetic contagion risks repeatedly in The Dark Arts of Rationality: Updated for the Digital Age, which was printed several months before COVID.

All of which is to say: COVID may not have been a black swan so much as a gray rhino—a high-impact risk that was visible to some, articulated by a few, but ignored by most institutions and individuals because it didn’t map cleanly onto their local decision models.

I think this distinction matters for forecasting as a discipline. It suggests that one of the core failures isn’t predictive ability per se, but attention allocation: which warnings get surfaced, amplified, and translated into actionable questions for the people whose decisions hinge on them. In that sense, I think you’re exactly right that Tetlock’s next frontier—teaching people how to ask better questions—is the crux.

So I’d summarize my position as: Forecasting works best in domains with history and well-posed questions, struggles at the edges, and fails catastrophically when important questions never get asked. But some events we label “unpredictable” may actually be predictable but institutionally invisible—which is a slightly different (and potentially more tractable) failure mode.

Curious whether that distinction resonates with your experience in intelligence work, or if you think I’m still underestimating the true weight of the unknown-unknown problem.

It suggests that one of the core failures isn’t predictive ability per se, but attention allocation: which warnings get surfaced, amplified, and translated into actionable questions for the people whose decisions hinge on them

I'm fine with this interpretation, but realistically I don't think you can meaningfully operationalise it. Obviously a cafe owner can't be expected to include pandemics, nuclear war, general IA risks, alien invasions etc in their risk assessment. Sure, in an ideal world they would be able to click on a holistic risk portal website, punch in their circumstances and know that on aggregate they have a 20% chance of a life alterning goepolitical incident ruining their business in the next 30 years. But...

However, at the system level, I’m less convinced it was unforeseeable. A number of people did, in fact, raise the specific risk in advance:

Bill Gates...

Nassim Taleb famously used the example of a airliner crashing into his building as an example of a black swan. This was pre-9/11. People immediately used this example as a rebuttal of his wider claims about black swans, as you could argue he had imagined exactly the kind world changing incident that was about to alter the course of US history, end the 1990s and shape global international relations for X years.

But there's a huge difference in functional use of these predictions and simply saying the words. Approximately zero cafe owners around the world had a "pandemic survival fund" in preparation for COVID 19. So it's fair to say that the industry as a whole, functionally, had no sense that this was a danger. Therefore it was a black swan to them, even though we have high profile guys saying "this is a possibility."

You have a fundamental problem that most intelligence analysts will describe on excruciating detail: we looked at this problem, we knew it was a possibility, but the operational team responsible for the investigation/operation/follow up didn't take the problem seriously. In a world with limited resources, we really can't expect ops teams to chase up every low likelihood problem. If a steel mill in Vietnam blows up tomorrow, the manufacturer that makes my dad's decorative embellishments for his roulette wheel company is going to raise prices. My dad doesn't really run any analysis on the health and safety standards of a random steel mill in Vietnam. But he does look into e.g. risks to the gambling industry where his products are sold. There's a level of reasonableness that we can expect, and then a level that is so far flung we just can't. Even if in hindsight it is clear that my dad's business hinges on a Vietnamese steel mill checking their power sockets every three months.

We live in a world with an infinity of risks. And every risk has another associated risk. With limited resources these risks quickly become black swans. Peter Theil might be the type of guy with the resources to analyse a lot of these and prepare a subterranean refuge under a New Zealand mountain, but for the rest of us, they're functionally black swans.

I'm even at odds with multiple federal and congressional investigations re: 9/11. My opinion is that they could not have possibly predicted 9/11 with the information they had. They would, by definition, be wrong to have predicted planes flying into the trade towers as the most likely scenario. Far more likely was a hijacking or a hostage scenario. This goes back to the limitations of reference class forecasting, as discussed. We can enumerate flood and fire risks and insure ourselves for the eventuality. But if an asteroid hit a major city, it's a black swan despite some astro physicist having warned us for the last 10 years that we're under prepared for this eventuality.

The point is that there's two problems with forecasting the future. Too little information means we can't form an accurate baseline, even if we know to ask the question to begin with. And too much information which results in us being functionally incapable of doing analysis on it due to limited resources.

Tetlock really focuses on accurately forecasting e.g. flood risks. Taleb focuses on the asteroid hitting the middle of NYC. These concepts both dovetail very nicely and both need to be understood by super forecasters or professional analysts.

Curious whether that distinction resonates with your experience in intelligence work, or if you think I’m still underestimating the true weight of the unknown-unknown problem.

It does and I agree with what you're saying overall.

I think identifying intelligence gaps is a massively under explored area. Like extremely under explored, and frighteningly so. As a field, we are very good at counting tanks, tracking submarines, analysing how to win battles. And as a field we're pretty good at filling existing information or intelligence gaps. Anybody can send a spy into an economic forum to take a picture of a tank through their pocket.

But nobody I know of is developing that unknown/unknown question, to determine which intelligence gaps we're not aware of. Intelligence is very reactive to new problems, but not very proactive in getting ahead of them before they become a problem. This is bad because those unidentified intelligence problems have a disproportionate effect on world events.

Ponderously telling people that there's a 5% chance of China invading Taiwan is useful, but right now the entire Danish political corps is chain smoking outside the government offices in Denmark (https://youtube.com/shorts/L-Rr9F9g_VM) because they weren't positioned for this possibility by their intelligence services. This is essentially a black swan by all meaningful definitions.

Cov19 the disease was eminently predictable -- we have novel seasonal respiratory virus outbreaks all the time, something like a 10%/a prediction would be not too bad.

The Cov19 response (which is the risk that bit cafe buyers' collective ass) would be pretty hard to predict, given that it was completely unimaginable in 2019 -- maybe a generalized "massive four horseman-related social disruption in my area" @ 1%/a would work, but this doesn't exactly seem like what we are after when talking about forecasting skillz.

I will answer your questions with two questions of my own. (The questions are semi-rhetorical in that I think they shed light on the answers to your questions, but also I would genuinely really like answers and I haven't seen any good answers.)


Theoretical Q.

I overall like the forecasting trend in the rationalist community. I find the idea of quantifying bias and uncertainty to be a valuable exercise that I have benefited from personally. I have a theory-level concern, however, that I've never seen properly addressed.

I internally model forecasting as: there exists a probability distribution over all possible futures, and the job of the forecaster is to approximate this distribution. In practice, forecasters do this by assigning probabilities to a bunch of events and then scoring themselves based on what actually happens (like you describe in your OP).

So here's my question: How confident can we actually be that your scoring algorithms are stable and consistent? I'm using these words in the technical sense from statistics. To see an example of how everything can go bad: Let's say you're trying to predict the number of people who die in 2026. If the true distribution of deaths/year is gaussian, you can use standard formulas for computing the mean and get a good estimate with error bars. But if the true distribution is Cauchy, the mean is undefined, and there is provably no way to accurately estimate this mean because it doesn't exist. The Cauchy distribution looks essentially identical to the Gaussian distribution, and it is extremely difficult to determine whether you are actually sampling from one or the other in practice. In practice, people who work under the Gaussian assumption will look like they're doing very well by the metrics superforecasters use until suddenly they have a disaster (see e.g. the 2008 financial collapse). Similarly, a 40% return over 5 years is "trivial" to achieve if you allow yourself to have a very high risk of ruin. Just invest in the S&P500 with 5x leverage.

So what are the actual, philosophical and statistical assumptions about the universe that superforecasters are relying on?


Practical Q

I work professionally with North Korea. I put in a lot of time studying their culture, geopolitics, language, etc in order to make my professional work more effective. I've long thought about how to quantify this work both to make my work even more effective and to convince other people that I am an expert on this topic. How do I go about as a practical matter starting to forecast on a very niche topic like this?

My impression is that most forecasters work very generally and basically try to eek out an edge over the general populace by (like you mention) not being fooled by basic statistical fallacies. This lets forecasters make more level-headed judgements about a wide range of topics, most of which are well-established questions that normies also think about (who will win the election? will an epidemic cause a downturn in the economy? etc.)

But I am interested only in a very narrow domain where there are basically no established questions to ask. With regards to North Korea, the basic questions might be:

  • Will Kim Jong Un die this year? (Almost certainly no; without looking it up, I'd guess the actuarial tables put him as <5% chance of death.)
  • Will the North and South declare war? (Also almost certainly no; I'd put it <1%.)
  • Will the North and South have a military skirmish? (Happens 1-2 times per decade, so let's say 20%)

But these are all super basic questions that anyone moderately politically aware could reasonably answer. There's no opportunity for me to develop my skill with questions like this, and there's not a "large enough n" for me to meaningfully test my skill. So I need to develop more detailed questions if I want to really improve my forecasting ability. But how? Some more detailed questions could be:

  • Will the North develop a new fully domestic cell phone in 2026? (I'd say 75% probability since they've been developing them the past few years. But then what exactly counts as "new" and what exactly counts as "fully domestic"?)
  • What will the price of rice be in Jan 2027? (It's currently 1 kilo/1800 won. I predict it will be <2200 in 1 year with 75% probability. Either a bad crop this year or more economic sanctions from the US could increase the price substantially, and I'll say that the union of those two events is about 25% probable.)

But how do I go about actually creating good questions like this? You especially want the questions to be correlated with the "basic"/important questions above, but it's not at all clear to me that the ability to predict food prices is at all related to the ability to predict whether and how large of a military conflict there will be.


One last aside: You don't mention the intelligence community at all. This is where calibrated predictions are rewarded more than narrative, and this is where people who actually want to work as superforecasters work. Some "fun" reading if you haven't already seen them are:

  1. "Psychology of Intelligence Analysis"

  2. "A Tradecraft Primer: Structured Analytic Techniques for Improving Intelligence Analysis"

  3. "Analytic Culture in the U.S. Intelligence Community"

These are all declassified CIA publications you can get from cia.gov. Most of my questions/frustrations expressed above are things that I've thought about from reading these works and talking to the people who use them professionally.

This is a very thoughtful comment—thank you for taking the time to lay it out so clearly. Also, thanks for the reading recommendations; I’m familiar with Psychology of Intelligence Analysis, but I haven’t read all three you listed, and I appreciate the pointers. The intelligence-community framing is very much adjacent to how I think about this problem.

Let me try to respond to both the theoretical and practical questions in turn.

Theoretical question: what assumptions are superforecasters actually making?

I think your concern is a real one, and I don’t think there’s a fully satisfying, formally rigorous answer yet.

You’re right that most forecasting implicitly assumes something like: there exists a stable-enough probability distribution over futures that can be approximated and scored. And you’re also right that if the underlying distribution is heavy-tailed, discontinuous, or adversarial in the wrong ways, then many common scoring and evaluation methods can look “good” right up until they catastrophically fail. Finance is full of examples of exactly this dynamic.

Two clarifications about my own claims:

I did not use leverage. The 40% average annual return I mentioned was achieved without leverage. I agree completely that high apparent performance with hidden ruin risk is trivial to generate, and I’m very wary of arguments that don’t control for that.

I don’t have a clean statistical confidence interval for my forecasting ability. I wish I did. What I can say—without pretending it’s a theorem—is that when I pitched this approach to VCs last year, several were interested in investing on the order of ~$2M. That’s not proof of correctness, but it does suggest that sophisticated actors found the combination of reasoning and track record at least plausible. (For the record, I embarrassed myself by not having the proper licenses lined up before pitching a hedge fund idea, which is a lesson I learned the hard way.)

More broadly, I think the honest answer is that superforecasting rests on a weak ontological assumption rather than a strong one: not that the world is well-behaved, but that some environments are predictable enough, often enough, to beat naive baselines. The goal isn’t asymptotic optimality; it’s persistent edge.

Where I personally diverge from the “pure scoring-rule” framing is that I don’t think of forecasting as approximating a single global distribution. Instead, I think of it as model selection under uncertainty, where the models themselves are provisional and frequently discarded. That doesn’t fully resolve the Cauchy-vs-Gaussian problem you raise—but it does mean I’m less committed to any single assumed distribution than the formalism might suggest.

Practical question: forecasting in a narrow, expert domain

Your North Korea example is excellent, and I agree with your diagnosis of the problem. If all you ask are first-order, low-entropy questions (“Will war break out this year?”), you get almost no learning signal, even if your answers are technically correct.

This is where my approach probably diverges from how most superforecasters would describe their own methods, and I want to be clear that I’m not claiming this is canonical.

Very roughly, my technique is to lean heavily on macro-level regularities and treat individuals as if they were particles—subject to incentives, constraints, and flows—rather than as unique narrative agents. At that level of abstraction, societies start to behave less like chess games and more like fluid systems. You can’t predict the motion of a single molecule, but you can often predict pressure gradients, bottlenecks, and phase transitions.

Applied to your case, that suggests focusing less on isolated facts (rice prices, phones) and more on questions that proxy for stress, throughput, and constraint relaxation. The exact phrasing matters less than whether the question sits on a causal pathway that connects to higher-level outcomes you care about.

You’re also right that the skill of asking good questions is the real bottleneck. My (imperfect) heuristic is to ask:

Does this variable aggregate many micro-decisions?

Is it constrained by hard resources or incentives?

Would a large deviation here force updates elsewhere?

Those questions won’t necessarily predict war directly—but they can tell you when the system is moving into a regime where war becomes more or less likely.

Finally, I agree with you that the intelligence community is one of the few places where calibration is actually rewarded rather than punished. In many ways, I think superforecasting is a partial rediscovery—by civilians—of techniques analysts have been developing for decades, albeit with better scoring and feedback loops.

I don’t think your concerns undermine forecasting as a practice. I think they correctly point out that it’s a tool with sharp edges, and that the hardest problems aren’t about probability math but about question selection, regime change, and institutional attention.

If you’re open to it, I’d actually be very interested in how you decide which NK-related variables are worth tracking at all—that feels like exactly the frontier Tetlock is gesturing at.

If you’re open to it, I’d actually be very interested in how you decide which NK-related variables are worth tracking at all—that feels like exactly the frontier Tetlock is gesturing at.

I don't track anything in any quantifiable way right now. I mostly just browse the .kp tld whenever I feel bored.

Some "fun" reading if you haven't already seen them are

I've been working in intelligence for 15 years and have read all of these books, and others from the canon.

I can say that none of these should be recommended after Tetlock's work has been published. Some techniques from the structured analytical technique toolbox don't work. Some definitely do more harm than good. And the continued teaching of these techniques in place of e.g. reference class forecasting is so baffling to me that I can't express my frustration.

Heuer seems like a good guy who was doing his best to fix the terrible problems in the CIA at the time. But he's been totally replaced as an authority on this subject since Kahneman came along. Kahneman obviously did experiments and knew the research. Heuer was going off his personal observations from his career, and suggested analytical techniques that were a little better than the gut feels of the ivy league scotch swizzling guys in the agency through the cold war. He has been very accepting of new research and basically says "I did my best to formalise analysis, if others can come along and do better that's fantastic."

Since that time, we've come a long way.

R Pherson on the other hand is just scum. He refuses to acknowledge that the techniques in his books don't have any scientific basis. They've been directly measured across various studies, and he essentially argues that they do, in fact, work. Despite them definitely not working on any meaningful metric. He's made a career in the lecture circuit and knows that backing down will undermine his financial basis.

The real indicator that these techniques don't work is that nobody in the planet really uses them. If they improved outcomes, people would. But instead everybody goes to these week long courses, learns how to do a bullshit mind map or analysis of competing hypothesis, gets signed off as certified, and never looks at the techniques again.