Causal inference primer
Bar: state the potential-outcomes definition of a treatment effect, explain confounding with a worked Simpson's-paradox table, and say why a propensity model ≠ an uplift model.
Prediction ≠ causation (the whole point)
A predictive model answers "given what I see, what will happen?" A causal model answers "if I intervene, what changes?" These come apart constantly: ice-cream sales predict drownings (both caused by summer), but banning ice cream saves no one. The business often thinks it wants a prediction when it actually wants to change the outcome (Predict vs cause checklist).
Potential outcomes
Each unit has two potential outcomes:
- Individual effect
— never directly observable (you see only one). - ATE
; CATE (effect for a subgroup). - Fundamental problem of causal inference: for any unit you observe one outcome, never the counterfactual. Causal inference is the science of recovering the missing half.
Confounding (why correlation lies)
A confounder
Worked: Simpson's paradox (by hand)
Recovery rates for a treatment, split by case severity:
| Treated | Untreated | |
|---|---|---|
| Mild | 90% (180/200) | 85% (170/200) |
| Severe | 50% (100/200) | 40% (80/200) |
| Aggregate | 70% (280/400) | 62.5% (250/400) |
Within each severity the treatment helps. But suppose doctors give the treatment mostly to severe cases — then the aggregate can reverse and make the treatment look harmful. Severity is the confounder; you must condition on it. The aggregate number is not the causal number.
How to actually estimate effects
- Randomization (A/B test): assignment is independent of everything → no confounding. The gold standard; why offline lift must be confirmed online (Online vs offline gap).
- Observational adjustment: condition on confounders — stratification, regression, propensity-score weighting (IPW), matching. Valid only under no unmeasured confounders (an untestable assumption).
- Quasi-experiments: difference-in-differences, instrumental variables, regression discontinuity — borrow a source of "as-if random" variation.
Propensity vs uplift (the for-profit crux)
- Propensity model:
— who will convert. Target them and you mostly reach people who'd have converted anyway (Goodhart: you optimize a proxy, waste budget). - Uplift / CATE model:
— who converts because of the treatment. That's the population worth spending on.
This distinction is the Q3 "causal turn" of the roadmap → Attribution / uplift.
By-hand exercise (meets the bar)
- Construct numbers where treatment helps in two subgroups yet hurts in aggregate (reverse the table above).
- Sketch the DAG
, and mark which path randomization removes.
Links
- Built on: Probability for sequences (conditional independence)
- Drives: Attribution / uplift · Predict vs cause checklist · Offline→online checklist