PPV, Sensitivity & Specificity Calculator — Bayes, LR, NNT | Worth the Test

About this tool — what it is and how to read it

Screening and treatment decisions hinge on a few numbers — how common a condition is, how accurate the test is, and how much the treatment helps or harms. Those numbers interact in ways that are easy to misjudge, especially when a condition is rare.

This is a tool to review those numbers. Set a population, a disease prevalence, a test's sensitivity and specificity, and a treatment's benefit and harm, then watch a single cohort flow from population → test → treatment → outcome — true and false positives, predictive value, and how many people end up helped, harmed, or unchanged. Plug in your own figures, or use the cited examples next to each slider.

Key terms

Prevalence (pre-test probability): How common the condition is in the group being tested. The starting point for everything downstream.
Sensitivity: Of people who have the condition, the share the test correctly flags positive.
Specificity: Of people who do not have it, the share the test correctly clears.
PPV / NPV: Given a positive (or negative) result, the chance it is right. Unlike sensitivity/specificity, these depend heavily on prevalence.
Likelihood ratio (LR): How much a result shifts the odds. LR+ for a positive, LR− for a negative — and they do not depend on prevalence.
NNT / NNH: Number needed to treat for one person to benefit; number needed to harm for one to be harmed.
NNS: Number needed to screen for one person to be helped — screening, testing, treatment, and outcomes all folded together.

The cascade

A cohort of 1,000 flows left to right: who has the disease, what the test says, who gets treated, and how they end up. Band height = number of people.

What happens to everyone

Each square is a person, colored by how they end up. One square per person.

The test

2×2 confusion matrix

Counts for 1,000 people. Sensitivity reads across the disease row; PPV reads down the test-positive column — they look at the table from perpendicular directions.

	Test +	Test −	Total
Disease +	TP9	FN1	10
Disease −	FP99	TN891	990
Total	108	892	1,000

How each measure is built from the four cells

Sensitivity TP ÷ (TP + FN) disease + row →
Specificity TN ÷ (TN + FP) disease − row →
PPV TP ÷ (TP + FP) test + column ↓
NPV TN ÷ (TN + FN) test − column ↓
LR+ sensitivity ÷ (1 − specificity)
LR− (1 − sensitivity) ÷ specificity

PPV (if test +)

8.3%

NPV (if test −)

99.9%

Sensitivity

90.0%

Specificity

90.0%

LR+

9.0

LR−

0.11

Probability tree

Split the population by disease, then by test result. Each path multiplies to a joint probability; Bayes just compares the two test-positive leaves.

P(disease | test +) = 0.90% ÷ (0.90% + 9.9%) = 8.3%

How PPV collapses with prevalence

Holding sensitivity and specificity fixed, the value of a positive result depends almost entirely on how common the disease is. The dashed line marks the current prevalence.

PPV — 8.3% NPV — 99.9%

The base-rate fallacy: at 1.00% prevalence, even a 90%/90% test makes a positive result correct only 8% of the time. Accuracy isn't the whole story — the base rate is.

Fagan nomogram

The likelihood ratio is the lever that turns a pre-test probability into a post-test one — and it doesn't depend on prevalence. Line shown for a positive result (LR+).

Pre-test 1.00% × LR+ 9.0 → Post-test 8.3%

This is Bayes' theorem, in odds form: prior odds × likelihood ratio = posterior odds. The LR is the strength of the evidence (LR+ > 10 “rules in”, LR− < 0.1 “rules out”).

The treatment

Treatment outcomes

108 interventions performed — 99 on people who never had the disease and so could not benefit.

Helped

1

Harmed

5

Treated, no change

102

Missed (untreated)

1

Number needed to screen

1,112

NNT (input)

10

NNH (input)

20

ARR (= 1 / NNT)

10.0%

Per 1,000 screened: about 1 helped, 5 harmed by treatment, 99 false alarms, and 1 missed.

Watch the relative-vs-absolute trap: a large “relative risk reduction” can still mean a large NNT when the baseline risk is low. Benefit (NNT) only means something next to its harms — that's why helped and harmed are always shown on the same denominator here.

Repeat testing

Serial testing — the false-alarm pile-up

Repeat a test on a healthy person and the chance of at least one false alarm climbs: 1 − specificityⁿ. This assumes independent rounds, so it's an upper bound — real repeat tests are correlated (often lower), while testing for many conditions at once adds more chances each round (can be higher). Hover the chart to read any round count.

1 − specⁿ at current specificity Real-study reference points

Reference points (≥1 false positive): Elmore et al., NEJM 1998 — 49% after 10 mammograms · Croswell et al., Ann Fam Med 2009 — ~60% (men) / 49% (women) after 14 multimodal PLCO tests.

Bayesian updating

Bayesian updating: learning a rate from data

Where do numbers like sensitivity or prevalence come from? You start with a prior belief, observe data, and get a posterior. For a rate, this is exact and runs right here — no simulation: posterior = Beta(α+k, β+n−k).

Prior Beta(2,2) Data 7/10 Posterior Beta(9,5)

Posterior mean

64.3%

95% credible interval

39%–86%

Prior mean → data

50% → 70%

Prior α2

Prior β2

Trials observed (n)10

Successes (k)7

The posterior mean sits between your prior mean and the observed rate — and the more data you collect, the more the data wins and the tighter the interval. That shrinking uncertainty is what the screening sliders quietly assume away by treating each rate as a fixed number.

How to use this calculator

Set the population — how many people you screen.
Set the prevalence — the pre-test probability, i.e. how common the condition is in that group. This is the single biggest driver of predictive value.
Set the test’s sensitivity and specificity — how well it catches the condition and how well it clears the healthy.
Set treatment uptake and the NNT / NNH — how many test-positives go on to treatment, and how often that treatment helps or harms.

Every view — the cascade, the 2×2 confusion matrix, the Bayes tree, the prevalence→PPV curve, the Fagan nomogram, the serial-testing curve, and the helped / harmed outcomes — recomputes live as you change any input. Use the cited examples next to each control to drop in real-world figures.

A worked example: why a positive test can still be a false alarm

Take 1,000 people, a condition with 1% prevalence, and a test that is 90% sensitive and 90% specific — the values this tool starts with. Of the 1,000, about 10 have the condition and 990 do not. The test correctly flags 9 of the 10 true cases (true positives) but also wrongly flags 99 of the 990 healthy people (false positives). So 108 people get a positive result, yet only 9 actually have the condition — a positive predictive value of about 8%. The result feels alarming, but most positives are false. That gap between a test’s accuracy and what a positive result actually means is the base-rate fallacy, and it is the whole point of this tool.

Frequently asked questions

What is positive predictive value (PPV)?

PPV is the probability that someone who tests positive truly has the condition. Unlike sensitivity and specificity — which are properties of the test — PPV also depends heavily on prevalence: the rarer the condition, the lower the PPV, even for a very accurate test.

Why can a positive test still mean you probably don’t have the condition?

When a condition is rare, the healthy group is so much larger than the sick group that even a small false-positive rate produces more false positives than true positives. In the worked example above, a 90% / 90% test at 1% prevalence gives a PPV of only about 8%.

How do you calculate PPV from sensitivity, specificity, and prevalence?

By Bayes’ theorem: PPV = (sensitivity × prevalence) ÷ [ sensitivity × prevalence + (1 − specificity) × (1 − prevalence) ]. The negative predictive value (NPV) is the mirror image for people who test negative.

What is a likelihood ratio, and why doesn’t it depend on prevalence?

A likelihood ratio summarizes how much a result shifts the odds of disease: LR+ = sensitivity ÷ (1 − specificity); LR− = (1 − sensitivity) ÷ specificity. Because they are built only from the test’s sensitivity and specificity, likelihood ratios are independent of prevalence — which is exactly why a Fagan nomogram can turn any pre-test probability into a post-test probability.

What is a Fagan nomogram?

A three-column chart: draw a line from your pre-test probability through the test’s likelihood ratio and it lands on the post-test probability. It is a visual form of Bayesian updating, and this tool draws one live.

Why does repeated (serial) screening raise the chance of a false positive?

Each additional round is another opportunity for a false alarm. If rounds were independent, the cumulative chance of at least one false positive is 1 − specificityⁿ. Real repeat tests are correlated, so that is an upper bound — but in practice the rates are high: about 49% after 10 mammograms (Elmore, 1998) and roughly 60% after 14 multimodal screening rounds (Croswell, 2009).

What is the difference between NNT, NNH, and NNS?

NNT (number needed to treat) is how many people must be treated for one to benefit; NNH (number needed to harm) is how many before one is harmed; NNS (number needed to screen) folds the whole chain together — how many must be screened for one person to be helped.

References & sources

This is an educational model, not a description of any specific test. The example figures offered next to each control are drawn from primary literature; key sources include:

Test accuracy — USPSTF, BCSC mammography benchmarks (Lehman, 2017), NLST low-dose CT, FIT meta-analysis (Lee, 2014), cfDNA NIPT meta-analysis (Gil, 2017).
Cumulative false positives — Elmore et al., NEJM 1998; Croswell et al., Ann Fam Med 2009.
Treatment benefit & harm (NNT / NNH) — Cochrane reviews and trial meta-analyses, with figures cross-checked against theNNT.com (e.g. statins, aspirin, anticoagulation, antibiotics).

Each input’s example callout links its own primary source. Formulas: PPV / NPV via Bayes; LR+ = sens / (1−spec); LR− = (1−sens) / spec; NNT = 1 / ARR; NNS = screened ÷ helped.