The cascade
A cohort of 1,000 flows left to right: who has the disease, what the test says, who gets treated, and how they end up. Band height = number of people.
What happens to everyone
Each square is a person, colored by how they end up. One square per person.
The test
2×2 confusion matrix
Counts for 1,000 people. Sensitivity reads across the disease row; PPV reads down the test-positive column — they look at the table from perpendicular directions.
| Test + | Test − | Total | |
|---|---|---|---|
| Disease + | TP9 | FN1 | 10 |
| Disease − | FP99 | TN891 | 990 |
| Total | 108 | 892 | 1,000 |
- Sensitivity TP ÷ (TP + FN) disease + row →
- Specificity TN ÷ (TN + FP) disease − row →
- PPV TP ÷ (TP + FP) test + column ↓
- NPV TN ÷ (TN + FN) test − column ↓
- LR+ sensitivity ÷ (1 − specificity)
- LR− (1 − sensitivity) ÷ specificity
Probability tree
Split the population by disease, then by test result. Each path multiplies to a joint probability; Bayes just compares the two test-positive leaves.
P(disease | test +) = 0.90% ÷ (0.90% + 9.9%) = 8.3%
How PPV collapses with prevalence
Holding sensitivity and specificity fixed, the value of a positive result depends almost entirely on how common the disease is. The dashed line marks the current prevalence.
The base-rate fallacy: at 1.00% prevalence, even a 90%/90% test makes a positive result correct only 8% of the time. Accuracy isn't the whole story — the base rate is.
Fagan nomogram
The likelihood ratio is the lever that turns a pre-test probability into a post-test one — and it doesn't depend on prevalence. Line shown for a positive result (LR+).
This is Bayes' theorem, in odds form: prior odds × likelihood ratio = posterior odds. The LR is the strength of the evidence (LR+ > 10 “rules in”, LR− < 0.1 “rules out”).
The treatment
Treatment outcomes
108 interventions performed — 99 on people who never had the disease and so could not benefit.
Per 1,000 screened: about 1 helped, 5 harmed by treatment, 99 false alarms, and 1 missed.
Watch the relative-vs-absolute trap: a large “relative risk reduction” can still mean a large NNT when the baseline risk is low. Benefit (NNT) only means something next to its harms — that's why helped and harmed are always shown on the same denominator here.
Repeat testing
Serial testing — the false-alarm pile-up
Repeat a test on a healthy person and the chance of at least one false alarm climbs: 1 − specificityn. This assumes independent rounds, so it's an upper bound — real repeat tests are correlated (often lower), while testing for many conditions at once adds more chances each round (can be higher). Hover the chart to read any round count.
Reference points (≥1 false positive): Elmore et al., NEJM 1998 — 49% after 10 mammograms · Croswell et al., Ann Fam Med 2009 — ~60% (men) / 49% (women) after 14 multimodal PLCO tests.
Bayesian updating
Bayesian updating: learning a rate from data
Where do numbers like sensitivity or prevalence come from? You start with a prior belief, observe data, and get a posterior. For a rate, this is exact
and runs right here — no simulation: posterior = Beta(α+k, β+n−k).
The posterior mean sits between your prior mean and the observed rate — and the more data you collect, the more the data wins and the tighter the interval. That shrinking uncertainty is what the screening sliders quietly assume away by treating each rate as a fixed number.