20 · Domain — MOC
seed#moc#domain
Up: plan
Event streams, clickstream, web analytics — the subject-matter substrate. This is where an engineering background is an asset.
Subject knowledge
- What is a clickstream — events, sessions, users, identity stitching
- Sessionization — windowing strategies, timeout choices, the boundary problem
- Event schema and drift — schema evolution, new event types, reprocessing
- Funnels and journeys — conversion paths, drop-off, multi-touch
- Attribution models — last-touch, position-based, data-driven; their failure modes
- Concept drift in production — seasonality, campaigns, bot waves; detection & retraining
- Online vs offline gap — why offline lift so often dies in the A/B test ⚠️ (a core trap)
- Bots, fraud, and invalid traffic — the anomaly side of the domain
- Privacy & consent constraints — GDPR, cookieless, what data I can actually use
Query & compute substrate
- DuckDB for local event analytics — Parquet, sessionize, funnels on a laptop
- ClickHouse — when local stops scaling (built for this exact workload)
- Polars over pandas — event tables get big
- Batch vs streaming decision — Kafka/Flink only if in-flight latency genuinely matters
→ The full local toolchain (install + why each tool) is documented in Offline compute stack.
linked from
- 🧭 Research Vault — HOME (Map of Content)
- 00_Home
- 10 · Foundations — MOC
- Attribution models
- Batch vs streaming decision
- Bots, fraud, and invalid traffic
- ClickHouse
- Concept drift in production
- DuckDB for local event analytics
- Event schema and drift
- Funnels and journeys
- Offline compute stack
- Online vs offline gap
- Polars over pandas
- Privacy & consent constraints
- Sessionization
- What is a clickstream