EXP — <one-line title>
planned#experiment
Up: Experiment-Log-MOC · Track: Track — ...
⚠️ Write everything down to ## Hypothesis before running anything. Predict the result first.
Question
What am I actually trying to find out? (one sentence)
Hypothesis
I expect <specific, falsifiable prediction> because
Setup
- Data: source, size, date range,
data_versionabove. Split by user & time (never by row). - Ground truth: what's the answer key? (synthetic = the planted pattern; real = the label + its provenance)
- Method / model: the minimum needed to test the hypothesis.
- Metrics logged: primary + the secondary ones I'll sanity-check (calibration, base rates, per-segment).
- Controls / baselines: the dumb baseline this must beat to mean anything.
Pre-registered checks (run before trusting the number)
- [ ] Leakage checklist — temporal, target, split integrity
- [ ] Base rates / class balance sane? (conversions often 1–3%)
- [ ] Result not too good to be true → Healthy paranoia
- [ ] Offline→online checklist if this is meant to inform a real decision
Result
- Numbers: (table; primary metric vs baseline, with the secondary checks)
- Plots / artifacts: links (kept out of git; see
.gitignore) - Run:
run_idabove
Interpretation
Did the hypothesis hold? Why is the number what it is? (Not "it passed" — why.) What did the secondary metrics say? Anything that looked wrong, and did I chase it down?
Verdict & decision
- Verdict: ✅ confirmed / ❌ refuted / 🤷 inconclusive
- Business decision it produces: (e.g. "windows past 30 min don't help → stop investing there")
- Negative result? Still a full write-up — it's information. See Experimental equanimity.
Next
- The one next experiment this suggests (link it once created).