Anomaly & drift papers

growing#literature#reading-list

Up: 50_Literature-MOC

Tight starter set — bots/fraud/invalid traffic + distribution shift over time. Feeds Anomaly & change detection and Concept drift in production. OA legend: ✅ open · 🟡 free copy · 🔒 paywalled. Read with How I read a paper.

🎯 Start here: chandola2009anomaly — its taxonomy (point / contextual / collective; supervised / unsupervised) is the vocabulary every method paper assumes. Path: chandola → gama → page → bifet → liu → truong

⬜ P1 · Anomaly Detection: A Survey

Chandola, Banerjee & Kumar (2009) · ACM Computing Surveys 41(3):15 · survey · chandola2009anomaly · 🟡 author PDF 🔗 https://vs.inf.ethz.ch/edu/HS2011/CPS/papers/chandola09_anomaly-detection-survey.pdf Why: The definitive taxonomy. The contextual/collective framing maps directly onto bot/fraud detection in clickstreams.

⬜ P2 · A Survey on Concept Drift Adaptation

Gama, Žliobaitė, Bifet, Pechenizkiy & Bouchachia (2014) · ACM Computing Surveys 46(4):44 · survey · gama2014conceptdrift · 🟡 accepted version 🔗 https://eprints.bournemouth.ac.uk/22491/1/ACM computing surveys.pdf Why: Why models degrade over time and which detection/retraining strategy to apply (trigger-based vs continuous). The backbone of Concept drift in production.

⬜ P3 · Continuous Inspection Schemes (CUSUM)

Page (1954) · Biometrika 41(1–2):100–114 · foundational · page1954cusum · 🔒 no 🔗 https://doi.org/10.1093/biomet/41.1-2.100 Why: The origin of cumulative-sum sequential change detection — still the standard low-latency online alarm on a univariate statistic. (If paywalled, read its description in any modern textbook; cite Page.)

⬜ P4 · Learning from Time-Changing Data with Adaptive Windowing (ADWIN)

Bifet & Gavaldà (2007) · SIAM SDM, pp. 443–448 · method · bifet2007adwin · 🔒 no 🔗 https://epubs.siam.org/doi/10.1137/1.9781611972771.42 Why: A rigorous adaptive sliding window with bounded false-positive/negative guarantees — directly implementable as an online drift alarm on click rate, session depth, error rate.

⬜ P5 · Isolation-Based Anomaly Detection (Isolation Forest)

Liu, Ting & Zhou (2012) · ACM TKDD 6(1):3 (journal extension of ICDM 2008) · method · liu2012isolation · ✅ yes 🔗 https://www.lamda.nju.edu.cn/publication/tkdd11.pdf Why: The go-to unsupervised anomaly scorer — linear time, low memory, no distance metric. Flags anomalous sessions without labelled fraud. Use the fuller TKDD version.

⬜ P6 · Selective Review of Offline Change Point Detection Methods

Truong, Oudre & Vayatis (2020) · Signal Processing 167:107299 · survey · truong2020changepoint · ✅ yes 🔗 https://arxiv.org/abs/1801.00718 Why: Clear structured review (PELT, BOCPD, kernel methods) under one formalism, with the ruptures Python library. For retrospective change-point analysis on historical segments — measure detection delay (Difficulty knobs).