Reading roadmap
seed#moc#literature#reading-list
Up: 50_Literature-MOC · plan
The tight path through everything collected here, sequenced to the 12-month plan. Read in service of doing (the next experiment), not completionism — but as a newcomer, read enough prior art to avoid reinventing solved wheels (Fast iteration beats parallelism). ✅ = open access.
Full lists: Foundations reading list · Clickstream foundational papers · Sequence modeling papers · Anomaly & drift papers · Uplift modeling papers.
▶ Start tomorrow (don't overthink it)
- MacKay ch. 2 (entropy/KL) → then do the by-hand exercise in Cross-entropy and KL divergence.
- Skim
srivastava2000webusagemining(the WUM pipeline) — 30-min pass-1 only. - Read
laub2015hawkes§1–2 — you'll use it in the generator.
Phase 0 · Foundations & the lab habit (Q1)
Goal: enough math to build the synthetic generator and measure recovery honestly.
- Info theory & ML toolkit:
mackay2003information(entropy/KL chapters) ·hastie2009eslormurphy2022pml(skim). - The generator's math:
rabiner1989hmm(HMM/Viterbi) ·laub2015hawkes+rizoiu2017hawkes(Hawkes) ·clauset2009powerlaw(heavy tails) ·bacry2017tick(simulate it). - Tie-in: these back the by-hand exercises in 10_Foundations-MOC and the Track — Synthetic generator + 3-algorithm bake-off.
Phase 1 · Domain grounding (Q1→Q2)
Goal: don't reinvent sessionization / path modeling.
srivastava2000webusagemining(the field's vocabulary) →spiliopoulou2003sessionreconstruction(sessionization, the hard part) →montgomery2004pathanalysis(Markov path prediction).- Backs What is a clickstream · Sessionization · Funnels and journeys.
Phase 2 · First real track + skepticism reflex (Q2)
Goal: one shippable-or-killed result, with leakage/measurement caught. Read only the arm your track needs.
- If prediction/discovery:
rendle2010fpmc→hidasi2016gru4rec→kang2018sasrec(+wang2019sequentialto map the field). Prereq:vaswani2017attention. - If anomaly/drift:
chandola2009anomaly→gama2014conceptdrift→bifet2007adwin/liu2012isolation. - Always: evaluation discipline — Evaluation theory, run the Leakage checklist and Offline→online checklist.
Phase 3 · The causal turn (Q3)
Goal: predict vs cause — uplift, not just propensity.
gutierrez2017uplift(survey) →kunzel2019metalearners(S/T/X-learners) →wager2018causal(causal forests) →anderl2016mapping(Markov attribution).- Backs Causal inference primer · Attribution / uplift · Predict vs cause checklist.
Phase 4 · Communicate & compound (Q4)
- Re-read the surveys (
wang2019sequential,gama2014conceptdrift) to place your results in the field. - One quote per source, paraphrase the rest — every paper gets a note from 80_Literature-Template.