Anomaly & drift papers
Tight starter set — bots/fraud/invalid traffic + distribution shift over time. Feeds Anomaly & change detection and Concept drift in production. OA legend: ✅ open · 🟡 free copy · 🔒 paywalled. Read with How I read a paper.
🎯 Start here:
chandola2009anomaly— its taxonomy (point / contextual / collective; supervised / unsupervised) is the vocabulary every method paper assumes. Path: chandola → gama → page → bifet → liu → truong
⬜ P1 · Anomaly Detection: A Survey
Chandola, Banerjee & Kumar (2009) · ACM Computing Surveys 41(3):15 · survey · chandola2009anomaly · 🟡 author PDF
🔗 https://vs.inf.ethz.ch/edu/HS2011/CPS/papers/chandola09_anomaly-detection-survey.pdf
Why: The definitive taxonomy. The contextual/collective framing maps directly onto bot/fraud detection in clickstreams.
⬜ P2 · A Survey on Concept Drift Adaptation
Gama, Žliobaitė, Bifet, Pechenizkiy & Bouchachia (2014) · ACM Computing Surveys 46(4):44 · survey · gama2014conceptdrift · 🟡 accepted version
🔗 https://eprints.bournemouth.ac.uk/22491/1/ACM computing surveys.pdf
Why: Why models degrade over time and which detection/retraining strategy to apply (trigger-based vs continuous). The backbone of Concept drift in production.
⬜ P3 · Continuous Inspection Schemes (CUSUM)
Page (1954) · Biometrika 41(1–2):100–114 · foundational · page1954cusum · 🔒 no
🔗 https://doi.org/10.1093/biomet/41.1-2.100
Why: The origin of cumulative-sum sequential change detection — still the standard low-latency online alarm on a univariate statistic. (If paywalled, read its description in any modern textbook; cite Page.)
⬜ P4 · Learning from Time-Changing Data with Adaptive Windowing (ADWIN)
Bifet & Gavaldà (2007) · SIAM SDM, pp. 443–448 · method · bifet2007adwin · 🔒 no
🔗 https://epubs.siam.org/doi/10.1137/1.9781611972771.42
Why: A rigorous adaptive sliding window with bounded false-positive/negative guarantees — directly implementable as an online drift alarm on click rate, session depth, error rate.
⬜ P5 · Isolation-Based Anomaly Detection (Isolation Forest)
Liu, Ting & Zhou (2012) · ACM TKDD 6(1):3 (journal extension of ICDM 2008) · method · liu2012isolation · ✅ yes
🔗 https://www.lamda.nju.edu.cn/publication/tkdd11.pdf
Why: The go-to unsupervised anomaly scorer — linear time, low memory, no distance metric. Flags anomalous sessions without labelled fraud. Use the fuller TKDD version.
⬜ P6 · Selective Review of Offline Change Point Detection Methods
Truong, Oudre & Vayatis (2020) · Signal Processing 167:107299 · survey · truong2020changepoint · ✅ yes
🔗 https://arxiv.org/abs/1801.00718
Why: Clear structured review (PELT, BOCPD, kernel methods) under one formalism, with the ruptures Python library. For retrospective change-point analysis on historical segments — measure detection delay (Difficulty knobs).
Links
- 50_Literature-MOC · Anomaly & change detection · Concept drift in production · Bots, fraud, and invalid traffic