One view I hold - one I know many people here will be skeptical of - is that the future is partially predictable in a systematic way. Not in a deterministic or oracular sense, but in the limited, Tetlock-style sense of assigning calibrated probabilities to uncertain events and doing so better than baseline forecasters over time.
I’ve spent roughly the last 15 years trying to formalize and stress-test my own forecasting process. During that period, I’ve made public, timestamped predictions about events such as COVID, the Ukraine war, and various market movements. Some of these forecasts were wrong, some were directionally correct, and many were correct with meaningful lead time. Taken together, I think they at least suggest that forecasting can be treated as a learnable, improvable skill rather than an exercise in narrative hindsight.
When I’ve raised versions of this argument in the past (including in The Motte’s earlier Reddit incarnation), I’ve consistently encountered a few objections. I think these objections reflect reasonable priors, so I want to address them explicitly.
1 - “If prediction is possible, why aren’t the experts already doing it?”
My claim is not that expertise is useless, but that many expert institutions are poorly optimized for predictive accuracy. Incentives matter. Academia, media, and policy organizations tend to reward coherence, confidence, and alignment with prevailing narratives more than calibration or long-term scoring.
One reason I became interested in forecasting is that I appear to have unusually strong priors and pattern-recognition ability by objective measures. I’ve scored in the top 1% on multiple standardized exams (SAT, SHSAT, GMAT) on first attempts, which at least suggests above-average ability to reason under uncertainty and time pressure. That doesn’t make me infallible, but it does affect my prior that this might be a domain where individual skill differences matter.
Tetlock’s work also suggests that elite forecasting performance correlates less with formal credentials and more with specific cognitive habits: base-rate awareness, decomposition, active updating, and comfort expressing uncertainty numerically. These traits are not especially rewarded by most expert pipelines, which may explain why high-status experts often underperform trained forecasters.
My suspicion - very much a hypothesis, not a conclusion - is that many people in communities like this one are already better forecasters than credentialed experts, even if they don’t label what they’re doing as forecasting.
2 - “If you can forecast, why not just make money in markets?”
This is a fair question, since markets are one of the few environments where forecasts are continuously scored.
I have used forecasting methods in investing. Over the past five years, my average annual return has been approximately 40%, substantially outperforming major indices and comparable to or better than many elite hedge funds over the same period. This is net of mistakes, drawdowns, and revisions—not a cherry-picked subset.
That said, markets are noisy, capital-constrained, and adversarial. Forecasting ability helps, but translating probabilistic beliefs into portfolio construction, position sizing, and risk management is its own discipline. Forecasting is a necessary input, not a sufficient condition for success.
More importantly, I don’t think markets are the only - or even the most interesting - application. Forecasting is at least as relevant to geopolitics, institutional risk, public health, and personal decision-making, where feedback is slower but the stakes are often higher.
3 - “Where are the receipts?”
That’s a reasonable demand. I’ve tried to make at least some predictions public and timestamped so they can be evaluated ex ante rather than reconstructed after the fact.
Here are a few examples where I laid out forecasts and reasoning in advance:
https://questioner.substack.com/p/more-stock-advice
https://questioner.substack.com/p/superforecasting-for-dummies-9a5
I don’t claim these constitute definitive proof. At best, they are auditable data points that can be examined, criticized, or falsified.
What I’m Actually Interested in Discussing
I’m not asking anyone to defer to my forecasts, and I’m not claiming prediction is easy or universally applicable. What I am interested in is whether superforecasting should be treated as a legitimate applied discipline—and, if so:
Where does it work reliably, and where does it fail?
How should forecasting skill be evaluated outside of markets?
What selection effects or survivorship biases should we worry about?
Can forecasting methods be exploited or weaponized?
What institutional designs would actually reward calibration over narrative?
If your view is that forecasting success is mostly an artifact of hindsight bias or selective memory, I’d be genuinely interested in stress-testing that claim. Likewise, if you think forecasting works only in narrow domains, I’d like to understand where you’d draw those boundaries and why.
I’m less interested in persuading anyone than in subjecting the model itself to adversarial scrutiny. Looking forwards to hearing your thoughts.

Jump in the discussion.
No email address required.
Notes -
I can reliably say that personal vehicles will be an important part of society in 2080 with a high degree of confidence. This is the type of question Tetlock/Kahneman's reference class forecasting does well to predict.
You also have the Nate Silver application of categorising and data analyses. So in a data rich environment (sports, weather, crime) you can generate very accurate assessments about the future.
But you also have the Nasim Taleb category of forecasting. E.g. if you can think to ask the question in the first place, it's probably not that useful. He empahsises the importance of the unknown unknowns and thinks we over invest in facts that are obvious about the future.
Said another way, we have known knowns (facts), known unknowns (intelligence gaps) and unknown unknowns (intelligence gaps we haven't identified yet). We can do a good job talking about the first two, and determine a base rate for whether a nuclear bomb is going to be detonated in the next 10 years. We do a very bad job at answering the questions we don't know we need to ask, for obvious reasons.
Tetlock has brought this up too. He says the next major research question is how to teach people to ask good questions to begin with. He doesn't use the intelligence gap terminology, but that's what professional intelligence analysts interpret this point as.
To put a point on this question you kind of need to understand the problem with identifying these intelligence questions to begin with. In 2019, probably a lot of experts on diseases and pandemics would have rated the possibility of a global outbreak quite highly in relative terms. By 2021 they would have known their predictions had been correct about the standing risk. But if I was opening a cafe in 2019, I would not have been able to even ask the question about a global pandemic which would kill my business in 2 years. I did not have the information required to know what i didn't know, and all of the business analysis I did was useless as I saw the city shut down society for months at a time.
The main gap in the reference class forecasting technique is that foundational problem. Knowing which questions are relevant to me in an environment of incomplete information.
It's one of the best tools we have. I've worked in intelligence for 15 years and can consistently give intuitive reactions just by having reference class forecasting in the back of my mind. A question will be thrown at us in a meeting and I intuitively stabilise thr inside view against a baseline outside view. Most of the time, in most applications, the answer is to moderate the claims of people who are getting excited about all this new information coming in. You can do this with Iran today, you can do it with the ICE raids, you can do it with the likely GDP per annum.
But it is much harder to be pointed at extreme edge cases. Tetlock resists that his technique is better used in environments with a lot of history to form the baseline. But in practical applications, I've found this to be the case.
Again, I dont think this is too much of a flaw. The real problem is the initial questions. If there's something really important that's going to happen next year, but nobody has the method or ability to identify that there are quesrions about it that need to be asked, the technique is useless. Everybody can argue over the arrival date of general AI milestones. But it is incredibly difficult to identify black swans ahead of time. And I think most important, world changing things hit people more like cafe owners in 2019 who have no idea their whole business hinges on this lab in China not fucking up one day.
This is an excellent comment, and I largely agree with your taxonomy and framing. In particular, I think you’re exactly right that reference-class forecasting shines most when you have (a) stable baselines and (b) a well-posed question to begin with. Your distinction between known unknowns and unknown unknowns maps very cleanly onto where forecasting techniques feel powerful versus where they feel brittle in practice.
Your intelligence-analysis perspective also rings true to me. Using the outside view as a stabilizer against excited inside-view narratives is, in my experience, one of the highest-leverage applications of forecasting. In most real-world settings, the dominant failure mode isn’t underreaction but overreaction to new, salient information, and reference classes are a very effective corrective.
Where I’d push back slightly—and I mean this as a nuance rather than a rejection—is on COVID as an example of a true black swan in the Taleb sense.
I agree completely with your café-owner framing: for many individuals, COVID was effectively unaskable ex ante, and therefore indistinguishable from an unknown unknown. At the decision-maker level, it absolutely behaved like a black swan. That’s an important and underappreciated point.
However, at the system level, I’m less convinced it was unforeseeable. A number of people did, in fact, raise the specific risk in advance:
Bill Gates publicly warned in 2015 that global pandemic preparedness was dangerously inadequate and that a fast-moving virus was a more realistic threat than many conventional disaster scenarios.
The Wuhan Institute of Virology had been criticized multiple times prior to 2020 for operating at biosafety levels below what many thought appropriate for the research being conducted.
More broadly, pandemic risk had a nontrivial base rate in the epidemiology and biosecurity literature, even if the exact trigger and timing were unknown.
On a more personal note (and not meant as special pleading), I discussed viral and memetic contagion risks repeatedly in The Dark Arts of Rationality: Updated for the Digital Age, which was printed several months before COVID.
All of which is to say: COVID may not have been a black swan so much as a gray rhino—a high-impact risk that was visible to some, articulated by a few, but ignored by most institutions and individuals because it didn’t map cleanly onto their local decision models.
I think this distinction matters for forecasting as a discipline. It suggests that one of the core failures isn’t predictive ability per se, but attention allocation: which warnings get surfaced, amplified, and translated into actionable questions for the people whose decisions hinge on them. In that sense, I think you’re exactly right that Tetlock’s next frontier—teaching people how to ask better questions—is the crux.
So I’d summarize my position as: Forecasting works best in domains with history and well-posed questions, struggles at the edges, and fails catastrophically when important questions never get asked. But some events we label “unpredictable” may actually be predictable but institutionally invisible—which is a slightly different (and potentially more tractable) failure mode.
Curious whether that distinction resonates with your experience in intelligence work, or if you think I’m still underestimating the true weight of the unknown-unknown problem.
More options
Context Copy link
More options
Context Copy link