site banner

Is the Future (Somewhat) Predictable? A Case for Treating Forecasting as a Skill

One view I hold - one I know many people here will be skeptical of - is that the future is partially predictable in a systematic way. Not in a deterministic or oracular sense, but in the limited, Tetlock-style sense of assigning calibrated probabilities to uncertain events and doing so better than baseline forecasters over time.

I’ve spent roughly the last 15 years trying to formalize and stress-test my own forecasting process. During that period, I’ve made public, timestamped predictions about events such as COVID, the Ukraine war, and various market movements. Some of these forecasts were wrong, some were directionally correct, and many were correct with meaningful lead time. Taken together, I think they at least suggest that forecasting can be treated as a learnable, improvable skill rather than an exercise in narrative hindsight.

When I’ve raised versions of this argument in the past (including in The Motte’s earlier Reddit incarnation), I’ve consistently encountered a few objections. I think these objections reflect reasonable priors, so I want to address them explicitly.

1 - “If prediction is possible, why aren’t the experts already doing it?”

My claim is not that expertise is useless, but that many expert institutions are poorly optimized for predictive accuracy. Incentives matter. Academia, media, and policy organizations tend to reward coherence, confidence, and alignment with prevailing narratives more than calibration or long-term scoring.

One reason I became interested in forecasting is that I appear to have unusually strong priors and pattern-recognition ability by objective measures. I’ve scored in the top 1% on multiple standardized exams (SAT, SHSAT, GMAT) on first attempts, which at least suggests above-average ability to reason under uncertainty and time pressure. That doesn’t make me infallible, but it does affect my prior that this might be a domain where individual skill differences matter.

Tetlock’s work also suggests that elite forecasting performance correlates less with formal credentials and more with specific cognitive habits: base-rate awareness, decomposition, active updating, and comfort expressing uncertainty numerically. These traits are not especially rewarded by most expert pipelines, which may explain why high-status experts often underperform trained forecasters.

My suspicion - very much a hypothesis, not a conclusion - is that many people in communities like this one are already better forecasters than credentialed experts, even if they don’t label what they’re doing as forecasting.

2 - “If you can forecast, why not just make money in markets?”

This is a fair question, since markets are one of the few environments where forecasts are continuously scored.

I have used forecasting methods in investing. Over the past five years, my average annual return has been approximately 40%, substantially outperforming major indices and comparable to or better than many elite hedge funds over the same period. This is net of mistakes, drawdowns, and revisions—not a cherry-picked subset.

That said, markets are noisy, capital-constrained, and adversarial. Forecasting ability helps, but translating probabilistic beliefs into portfolio construction, position sizing, and risk management is its own discipline. Forecasting is a necessary input, not a sufficient condition for success.

More importantly, I don’t think markets are the only - or even the most interesting - application. Forecasting is at least as relevant to geopolitics, institutional risk, public health, and personal decision-making, where feedback is slower but the stakes are often higher.

3 - “Where are the receipts?”

That’s a reasonable demand. I’ve tried to make at least some predictions public and timestamped so they can be evaluated ex ante rather than reconstructed after the fact.

Here are a few examples where I laid out forecasts and reasoning in advance:

https://questioner.substack.com/p/more-stock-advice

https://questioner.substack.com/p/superforecasting-for-dummies-9a5

I don’t claim these constitute definitive proof. At best, they are auditable data points that can be examined, criticized, or falsified.

What I’m Actually Interested in Discussing

I’m not asking anyone to defer to my forecasts, and I’m not claiming prediction is easy or universally applicable. What I am interested in is whether superforecasting should be treated as a legitimate applied discipline—and, if so:

Where does it work reliably, and where does it fail?

How should forecasting skill be evaluated outside of markets?

What selection effects or survivorship biases should we worry about?

Can forecasting methods be exploited or weaponized?

What institutional designs would actually reward calibration over narrative?

If your view is that forecasting success is mostly an artifact of hindsight bias or selective memory, I’d be genuinely interested in stress-testing that claim. Likewise, if you think forecasting works only in narrow domains, I’d like to understand where you’d draw those boundaries and why.

I’m less interested in persuading anyone than in subjecting the model itself to adversarial scrutiny. Looking forwards to hearing your thoughts.

5
Jump in the discussion.

No email address required.

This is a very thoughtful comment—thank you for taking the time to lay it out so clearly. Also, thanks for the reading recommendations; I’m familiar with Psychology of Intelligence Analysis, but I haven’t read all three you listed, and I appreciate the pointers. The intelligence-community framing is very much adjacent to how I think about this problem.

Let me try to respond to both the theoretical and practical questions in turn.

Theoretical question: what assumptions are superforecasters actually making?

I think your concern is a real one, and I don’t think there’s a fully satisfying, formally rigorous answer yet.

You’re right that most forecasting implicitly assumes something like: there exists a stable-enough probability distribution over futures that can be approximated and scored. And you’re also right that if the underlying distribution is heavy-tailed, discontinuous, or adversarial in the wrong ways, then many common scoring and evaluation methods can look “good” right up until they catastrophically fail. Finance is full of examples of exactly this dynamic.

Two clarifications about my own claims:

I did not use leverage. The 40% average annual return I mentioned was achieved without leverage. I agree completely that high apparent performance with hidden ruin risk is trivial to generate, and I’m very wary of arguments that don’t control for that.

I don’t have a clean statistical confidence interval for my forecasting ability. I wish I did. What I can say—without pretending it’s a theorem—is that when I pitched this approach to VCs last year, several were interested in investing on the order of ~$2M. That’s not proof of correctness, but it does suggest that sophisticated actors found the combination of reasoning and track record at least plausible. (For the record, I embarrassed myself by not having the proper licenses lined up before pitching a hedge fund idea, which is a lesson I learned the hard way.)

More broadly, I think the honest answer is that superforecasting rests on a weak ontological assumption rather than a strong one: not that the world is well-behaved, but that some environments are predictable enough, often enough, to beat naive baselines. The goal isn’t asymptotic optimality; it’s persistent edge.

Where I personally diverge from the “pure scoring-rule” framing is that I don’t think of forecasting as approximating a single global distribution. Instead, I think of it as model selection under uncertainty, where the models themselves are provisional and frequently discarded. That doesn’t fully resolve the Cauchy-vs-Gaussian problem you raise—but it does mean I’m less committed to any single assumed distribution than the formalism might suggest.

Practical question: forecasting in a narrow, expert domain

Your North Korea example is excellent, and I agree with your diagnosis of the problem. If all you ask are first-order, low-entropy questions (“Will war break out this year?”), you get almost no learning signal, even if your answers are technically correct.

This is where my approach probably diverges from how most superforecasters would describe their own methods, and I want to be clear that I’m not claiming this is canonical.

Very roughly, my technique is to lean heavily on macro-level regularities and treat individuals as if they were particles—subject to incentives, constraints, and flows—rather than as unique narrative agents. At that level of abstraction, societies start to behave less like chess games and more like fluid systems. You can’t predict the motion of a single molecule, but you can often predict pressure gradients, bottlenecks, and phase transitions.

Applied to your case, that suggests focusing less on isolated facts (rice prices, phones) and more on questions that proxy for stress, throughput, and constraint relaxation. The exact phrasing matters less than whether the question sits on a causal pathway that connects to higher-level outcomes you care about.

You’re also right that the skill of asking good questions is the real bottleneck. My (imperfect) heuristic is to ask:

Does this variable aggregate many micro-decisions?

Is it constrained by hard resources or incentives?

Would a large deviation here force updates elsewhere?

Those questions won’t necessarily predict war directly—but they can tell you when the system is moving into a regime where war becomes more or less likely.

Finally, I agree with you that the intelligence community is one of the few places where calibration is actually rewarded rather than punished. In many ways, I think superforecasting is a partial rediscovery—by civilians—of techniques analysts have been developing for decades, albeit with better scoring and feedback loops.

I don’t think your concerns undermine forecasting as a practice. I think they correctly point out that it’s a tool with sharp edges, and that the hardest problems aren’t about probability math but about question selection, regime change, and institutional attention.

If you’re open to it, I’d actually be very interested in how you decide which NK-related variables are worth tracking at all—that feels like exactly the frontier Tetlock is gesturing at.

If you’re open to it, I’d actually be very interested in how you decide which NK-related variables are worth tracking at all—that feels like exactly the frontier Tetlock is gesturing at.

I don't track anything in any quantifiable way right now. I mostly just browse the .kp tld whenever I feel bored.