Clickstream foundational papers

growing#literature#reading-list

Up: 50_Literature-MOC

Tight starter set — decades-old prior art so you don't reinvent solved wheels (Sessionization especially). OA legend: ✅ open · 🟡 free author/preprint copy · 🔒 paywalled. Read with How I read a paper; spin up a full note per paper from 80_Literature-Template.

🎯 Start here: srivastava2000webusagemining — it defines the field's shared vocabulary every other paper assumes. Path: srivastava → spiliopoulou → bucklin2003 → montgomery → bucklin2009 → mobasher

⬜ P1 · Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data

Srivastava, Cooley, Deshpande & Tan (2000) · SIGKDD Explorations 1(2):12–23 · survey · srivastava2000webusagemining · 🟡 partial 🔗 https://dl.acm.org/doi/10.1145/846183.846188 Why: The canonical entry point — defines the three-phase WUM pipeline (preprocessing → pattern discovery → analysis) and the problem taxonomy.

⬜ P2 · A Framework for the Evaluation of Session Reconstruction Heuristics in Web-Usage Analysis

Spiliopoulou, Mobasher, Berendt & Nakagawa (2003) · INFORMS J. Computing 15(2):171–190 · spiliopoulou2003sessionreconstruction · ✅ yes 🔗 http://facweb.cs.depaul.edu/mobasher/research/papers/SMBN03.pdf Why: The definitive treatment of sessionization — taxonomy + evaluation of time-out / maximal-forward-reference / navigation heuristics. Directly reused in every clickstream pipeline.

⬜ P3 · Modeling Online Browsing and Path Analysis Using Clickstream Data

Montgomery, Li, Srinivasan & Liechty (2004) · Marketing Science 23(4):579–595 · montgomery2004pathanalysis · ✅ yes (CMU author page) 🔗 https://www.andrew.cmu.edu/user/alm3/papers/purchase conversion.pdf Why: The core Markov path-prediction paper — shows first-order Markov is insufficient (memory matters). The "solved wheel" your synthetic generator's transition matrix is reinventing on purpose (Markov chains and HMMs).

⬜ P4 · Click Here for Internet Insight: Advances in Clickstream Data Analysis in Marketing

Bucklin & Sismeiro (2009) · J. Interactive Marketing 23(1):35–48 · review · bucklin2009clickstreaminsight · 🟡 SSRN 🔗 https://ssrn.com/abstract=1118315 Why: Authoritative literature review of clickstream research — a structured map of prior art, and what clickstream data can vs. cannot support.

⬜ P5 · A Model of Web Site Browsing Behavior Estimated on Clickstream Data

Bucklin & Sismeiro (2003) · J. Marketing Research 40(3):249–267 · bucklin2003browsingmodel · 🔒 no 🔗 https://doi.org/10.1509/jmkr.40.3.249.19241 Why: The foundational empirical browsing model (page-duration + navigation choice jointly). Read alongside P3 to see measurement model vs. prediction model.

⬜ P6 · Automatic Personalization Based on Web Usage Mining

Mobasher, Cooley & Srivastava (2000) · Communications of the ACM 43(8):142–151 · mobasher2000personalization · 🟡 author copy 🔗 https://doi.org/10.1145/345124.345169 · author PDFs: http://facweb.cs.depaul.edu/mobasher/pubs.html Why: Turns mined usage patterns into a real recommendation engine — the WUM taxonomy made into a deployable architecture.