EXP — <one-line title>

planned#experiment

Up: Experiment-Log-MOC · Track: Track — ...

⚠️ Write everything down to ## Hypothesis before running anything. Predict the result first.

Question

What am I actually trying to find out? (one sentence)

Hypothesis

I expect <specific, falsifiable prediction> because . If true, I should see ≳ . If false, I should see .

Setup

Data: source, size, date range, data_version above. Split by user & time (never by row).
Ground truth: what's the answer key? (synthetic = the planted pattern; real = the label + its provenance)
Method / model: the minimum needed to test the hypothesis.
Metrics logged: primary + the secondary ones I'll sanity-check (calibration, base rates, per-segment).
Controls / baselines: the dumb baseline this must beat to mean anything.

Pre-registered checks (run before trusting the number)

[ ] Leakage checklist — temporal, target, split integrity
[ ] Base rates / class balance sane? (conversions often 1–3%)
[ ] Result not too good to be true → Healthy paranoia
[ ] Offline→online checklist if this is meant to inform a real decision

Result

Numbers: (table; primary metric vs baseline, with the secondary checks)
Plots / artifacts: links (kept out of git; see .gitignore)
Run: run_id above

Interpretation

Did the hypothesis hold? Why is the number what it is? (Not "it passed" — why.) What did the secondary metrics say? Anything that looked wrong, and did I chase it down?

Verdict & decision

Verdict: ✅ confirmed / ❌ refuted / 🤷 inconclusive
Business decision it produces: (e.g. "windows past 30 min don't help → stop investing there")
Negative result? Still a full write-up — it's information. See Experimental equanimity.

The one next experiment this suggests (link it once created).