Foundations reading list

growing#literature#reading-list#foundations

Up: 10_Foundations-MOC · also indexed under 50_Literature-MOC

The texts behind the 10 Foundations notes. You don't read these cover-to-cover — you read the chapter that backs whichever foundation you're deriving by hand. Each entry below names the note it supports. OA legend: ✅ open · 🟡 free copy · 🔒 paywalled.

🎯 If you read only three to start: mackay2003information (information theory), hastie2009esl or murphy2022pml (the ML toolkit), rabiner1989hmm (HMMs — straight into the synthetic generator).

Core math & ML

⬜ Information Theory, Inference, and Learning Algorithms

MacKay (2003) · book, Cambridge · mackay2003information · ✅ free to read (printing restricted) 🔗 https://www.inference.org.uk/itprnn/book.pdf Backs: Cross-entropy and KL divergence · Information theory basics. The single best unified treatment of entropy/cross-entropy/KL + Bayesian inference — the exact losses and divergences you use day one.

⬜ The Elements of Statistical Learning (2nd ed.)

Hastie, Tibshirani & Friedman (2009) · book, Springer · hastie2009esl · ✅ free PDF 🔗 https://hastie.su.domains/ElemStatLearn/ Backs: SVD and low-rank structure · Evaluation theory. The classical ML/stats toolkit (regularization, PCA/SVD, trees/ensembles) you'll benchmark sequence models against.

⬜ Probabilistic Machine Learning: An Introduction

Murphy (2022) · book, MIT Press · murphy2022pml · ✅ free draft (CC-BY-NC-ND) 🔗 https://probml.github.io/pml-book/book1.html Backs: Probability for sequences. Modern, accessible probabilistic ML — cleaner notation than Bishop, up-to-date on deep sequence models. A live alternative/complement to ESL.

⬜ Causal Inference: What If

Hernán & Robins (continuously updated) · book · hernan2020causal · ✅ free PDF 🔗 https://miguelhernan.org/whatifbook Backs: Causal inference primer. The accessible entry to counterfactuals + DAG identification — "what would have happened if the user hadn't seen this event?"

⬜ Convex Optimization

Boyd & Vandenberghe (2004) · book, Cambridge · boyd2004convex · ✅ free PDF 🔗 https://web.stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf Backs: Optimization & gradients. Gradient descent, duality, convergence — the optimization vocabulary under every training loop.

⬜ Reinforcement Learning: An Introduction (2nd ed.)

Sutton & Barto (2018) · book, MIT Press · sutton2018rl · ✅ free PDF 🔗 http://incompleteideas.net/book/the-book-2nd.html Backs: Optimization & gradients (policy-gradient section). Target ch. 13 (REINFORCE / policy-gradient theorem) — the score-function trick, ancestor of sequence-level training objectives.

Stochastic processes & data craft

(these directly enable the synthetic generator — Generating background traffic)

⬜ A Tutorial on Hidden Markov Models…

Rabiner (1989) · Proc. IEEE 77(2):257–286 · tutorial · rabiner1989hmm · 🟡 paywall, widely mirrored 🔗 https://ieeexplore.ieee.org/document/18626 Backs: Markov chains and HMMs. The canonical Forward–Backward / Viterbi / Baum-Welch derivation. Model session intent as hidden states with known ground-truth assignments.

⬜ Hawkes Processes

Laub, Taimre & Pollett (2015) · arXiv:1507.02822 · survey-tutorial · laub2015hawkes · ✅ yes 🔗 https://arxiv.org/abs/1507.02822 Backs: Point processes. Compact, precise: conditional intensity, Ogata-thinning simulation, MLE fitting — the math for self-exciting (bursty) event generators.

⬜ A Tutorial on Hawkes Processes for Events in Social Media

Rizoiu, Lee, Mishra & Xie (2017) · arXiv:1708.06401 · tutorial · rizoiu2017hawkes · ✅ yes 🔗 https://arxiv.org/abs/1708.06401 Backs: Point processes. Practitioner companion to Laub — worked examples + code; cascade framing maps onto clickstream virality. Read alongside laub2015hawkes.

⬜ Power-law Distributions in Empirical Data

Clauset, Shalizi & Newman (2009) · SIAM Review 51(4):661–703 · clauset2009powerlaw · ✅ yes 🔗 https://arxiv.org/abs/0706.1062 · code: https://aaronclauset.github.io/powerlaws/ Backs: Heavy-tailed distributions. The principled MLE + KS-test way to fit power-law tails (the antidote to naïve log-log regression) — so planted heavy tails are statistically defensible.

⬜ Survival Analysis — lifelines documentation

Davidson-Pilon et al. (ongoing) · docs · lifelines2024docs · ✅ yes 🔗 https://lifelines.readthedocs.io/en/latest/Survival Analysis intro.html Backs: Prediction (time-to-event). Survival S(t), hazard h(t), right-censoring + Kaplan-Meier/Cox — and it's executable Python for time-to-churn / time-to-next-event.

⬜ tick: a Python Library… (Hawkes & time-dependent models)

Bacry, Bompaire, Gaïffas & Poulsen (2017) · JMLR 18(214) · software · bacry2017tick · ✅ yes 🔗 https://arxiv.org/abs/1707.03003 Backs: Synthetic-data toolkit · Offline compute stack. C++-backed multivariate Hawkes simulation + inference — plant an excitation matrix, then recover it as a ground-truth validation loop.