Causal inference primer

evergreen#foundations#causality

Up: 10_Foundations-MOC

Bar: state the potential-outcomes definition of a treatment effect, explain confounding with a worked Simpson's-paradox table, and say why a propensity model ≠ an uplift model.

Prediction ≠ causation (the whole point)

A predictive model answers "given what I see, what will happen?" A causal model answers "if I intervene, what changes?" These come apart constantly: ice-cream sales predict drownings (both caused by summer), but banning ice cream saves no one. The business often thinks it wants a prediction when it actually wants to change the outcome (Predict vs cause checklist).

Potential outcomes

Each unit has two potential outcomes: Y(1)Y(1) if treated, Y(0)Y(0) if not.

Confounding (why correlation lies)

A confounder ZZ causes both treatment TT and outcome YY, creating a spurious T ⁣ ⁣YT\!-\!Y association (TZYT\leftarrow Z\rightarrow Y). Naïve E[YT=1]E[YT=0]\mathbb{E}[Y\mid T=1]-\mathbb{E}[Y\mid T=0] then mixes the real effect with selection.

Worked: Simpson's paradox (by hand)

Recovery rates for a treatment, split by case severity:

Treated Untreated
Mild 90% (180/200) 85% (170/200)
Severe 50% (100/200) 40% (80/200)
Aggregate 70% (280/400) 62.5% (250/400)

Within each severity the treatment helps. But suppose doctors give the treatment mostly to severe cases — then the aggregate can reverse and make the treatment look harmful. Severity is the confounder; you must condition on it. The aggregate number is not the causal number.

How to actually estimate effects

Propensity vs uplift (the for-profit crux)

This distinction is the Q3 "causal turn" of the roadmap → Attribution / uplift.

By-hand exercise (meets the bar)

  1. Construct numbers where treatment helps in two subgroups yet hurts in aggregate (reverse the table above).
  2. Sketch the DAG TZYT\leftarrow Z\rightarrow Y, TYT\rightarrow Y and mark which path randomization removes.