Foundations reading list
Up: 10_Foundations-MOC · also indexed under 50_Literature-MOC
The texts behind the 10 Foundations notes. You don't read these cover-to-cover — you read the chapter that backs whichever foundation you're deriving by hand. Each entry below names the note it supports. OA legend: ✅ open · 🟡 free copy · 🔒 paywalled.
🎯 If you read only three to start:
mackay2003information(information theory),hastie2009eslormurphy2022pml(the ML toolkit),rabiner1989hmm(HMMs — straight into the synthetic generator).
Core math & ML
⬜ Information Theory, Inference, and Learning Algorithms
MacKay (2003) · book, Cambridge · mackay2003information · ✅ free to read (printing restricted)
🔗 https://www.inference.org.uk/itprnn/book.pdf
Backs: Cross-entropy and KL divergence · Information theory basics. The single best unified treatment of entropy/cross-entropy/KL + Bayesian inference — the exact losses and divergences you use day one.
⬜ The Elements of Statistical Learning (2nd ed.)
Hastie, Tibshirani & Friedman (2009) · book, Springer · hastie2009esl · ✅ free PDF
🔗 https://hastie.su.domains/ElemStatLearn/
Backs: SVD and low-rank structure · Evaluation theory. The classical ML/stats toolkit (regularization, PCA/SVD, trees/ensembles) you'll benchmark sequence models against.
⬜ Probabilistic Machine Learning: An Introduction
Murphy (2022) · book, MIT Press · murphy2022pml · ✅ free draft (CC-BY-NC-ND)
🔗 https://probml.github.io/pml-book/book1.html
Backs: Probability for sequences. Modern, accessible probabilistic ML — cleaner notation than Bishop, up-to-date on deep sequence models. A live alternative/complement to ESL.
⬜ Causal Inference: What If
Hernán & Robins (continuously updated) · book · hernan2020causal · ✅ free PDF
🔗 https://miguelhernan.org/whatifbook
Backs: Causal inference primer. The accessible entry to counterfactuals + DAG identification — "what would have happened if the user hadn't seen this event?"
⬜ Convex Optimization
Boyd & Vandenberghe (2004) · book, Cambridge · boyd2004convex · ✅ free PDF
🔗 https://web.stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf
Backs: Optimization & gradients. Gradient descent, duality, convergence — the optimization vocabulary under every training loop.
⬜ Reinforcement Learning: An Introduction (2nd ed.)
Sutton & Barto (2018) · book, MIT Press · sutton2018rl · ✅ free PDF
🔗 http://incompleteideas.net/book/the-book-2nd.html
Backs: Optimization & gradients (policy-gradient section). Target ch. 13 (REINFORCE / policy-gradient theorem) — the score-function trick, ancestor of sequence-level training objectives.
Stochastic processes & data craft
(these directly enable the synthetic generator — Generating background traffic)
⬜ A Tutorial on Hidden Markov Models…
Rabiner (1989) · Proc. IEEE 77(2):257–286 · tutorial · rabiner1989hmm · 🟡 paywall, widely mirrored
🔗 https://ieeexplore.ieee.org/document/18626
Backs: Markov chains and HMMs. The canonical Forward–Backward / Viterbi / Baum-Welch derivation. Model session intent as hidden states with known ground-truth assignments.
⬜ Hawkes Processes
Laub, Taimre & Pollett (2015) · arXiv:1507.02822 · survey-tutorial · laub2015hawkes · ✅ yes
🔗 https://arxiv.org/abs/1507.02822
Backs: Point processes. Compact, precise: conditional intensity, Ogata-thinning simulation, MLE fitting — the math for self-exciting (bursty) event generators.
⬜ A Tutorial on Hawkes Processes for Events in Social Media
Rizoiu, Lee, Mishra & Xie (2017) · arXiv:1708.06401 · tutorial · rizoiu2017hawkes · ✅ yes
🔗 https://arxiv.org/abs/1708.06401
Backs: Point processes. Practitioner companion to Laub — worked examples + code; cascade framing maps onto clickstream virality. Read alongside laub2015hawkes.
⬜ Power-law Distributions in Empirical Data
Clauset, Shalizi & Newman (2009) · SIAM Review 51(4):661–703 · clauset2009powerlaw · ✅ yes
🔗 https://arxiv.org/abs/0706.1062 · code: https://aaronclauset.github.io/powerlaws/
Backs: Heavy-tailed distributions. The principled MLE + KS-test way to fit power-law tails (the antidote to naïve log-log regression) — so planted heavy tails are statistically defensible.
⬜ Survival Analysis — lifelines documentation
Davidson-Pilon et al. (ongoing) · docs · lifelines2024docs · ✅ yes
🔗 https://lifelines.readthedocs.io/en/latest/Survival Analysis intro.html
Backs: Prediction (time-to-event). Survival S(t), hazard h(t), right-censoring + Kaplan-Meier/Cox — and it's executable Python for time-to-churn / time-to-next-event.
⬜ tick: a Python Library… (Hawkes & time-dependent models)
Bacry, Bompaire, Gaïffas & Poulsen (2017) · JMLR 18(214) · software · bacry2017tick · ✅ yes
🔗 https://arxiv.org/abs/1707.03003
Backs: Synthetic-data toolkit · Offline compute stack. C++-backed multivariate Hawkes simulation + inference — plant an excitation matrix, then recover it as a ground-truth validation loop.