🧭 Research Vault β€” HOME (Map of Content)

Purpose: A living knowledge base to take me from software engineer to bonafide researcher in event streams, web analytics, and clickstream prediction β€” within one year, in a for-profit context.

How to use this: This is the top-level index (a "Map of Content"). Each [[bracketed link]] is a note to create in Obsidian. Don't fill it all in up front β€” grow it as you go. The structure matters more than the completeness. Plain Markdown, version-controlled with git.


The one-sentence thesis

An engineer asks "does it work?" A researcher asks "is it true, and why?" This entire vault exists to retrain that reflex.


Vault architecture

A folder-per-domain layout. Numeric prefixes keep them ordered. One idea per note; link liberally; let the graph emerge.

/Research-Vault
β”œβ”€β”€ 00_Home/                  ← this file + sub-MOCs
β”œβ”€β”€ 10_Foundations/           ← the math & fundamentals to actually learn
β”œβ”€β”€ 20_Domain/                ← clickstream / event-stream subject knowledge
β”œβ”€β”€ 30_Methods/               ← the 3 research areas + synthetic data craft
β”œβ”€β”€ 40_Experiments/           ← lab notebook: one note per experiment
β”œβ”€β”€ 50_Literature/            ← Zotero-linked paper notes
β”œβ”€β”€ 60_Craft/                 ← how to be a researcher (temperament, traps)
β”œβ”€β”€ 70_Tracks/                ← research tracks / project portfolio
β”œβ”€β”€ 80_Templates/             ← note templates (experiment, paper, track)
└── 90_Inbox/                 ← fleeting notes, capture-first, sort later

Top-level maps to build next: 10_Foundations-MOC Β· 20_Domain-MOC Β· 30_Methods-MOC Β· 60_Craft-MOC Β· 70_Tracks-MOC


10 Β· Foundations β€” go deep on basics, not 2026 fads

The fundamentals haven't changed in decades; the frameworks will. Learn the thing under the thing. For each, the bar is "can I derive/visualize it from memory and compute a tiny example by hand?" β€” not "have I read about it."

Anti-fad rule: be wary of any topic that's been hot for < 6 months. Learn the 40-year-old idea underneath it first.


20 Β· Domain β€” event streams, clickstream, web analytics

The subject-matter substrate. This is where my engineering background is an asset.

Query & compute substrate


30 Β· Methods β€” the 3 research areas + how to build ground truth

The core of the discipline. Each area maps to a different known pattern you plant in synthetic data β€” which is how you measure your algorithm honestly.

The three areas

Synthetic data: plant ground truth, measure recovery

The reason to build synthetic data is that you know the answer key. Generate realistic background, inject a known signal, score how well the algorithm recovers it.

First project that exercises all three: a Markov+Hawkes generator with one planted motif, one predictive feature, one mid-stream drift β€” then see which algorithm family catches its target. β†’ see 70_Tracks-MOC


40 Β· Experiments β€” the lab notebook

One note per experiment, created before you run it (hypothesis first). This is the single highest-leverage habit for becoming a researcher. Template: 80_Experiment-Template.

Index: Experiment-Log-MOC (running list, newest first)


50 Β· Literature β€” paper notes, Zotero-linked


60 Β· Craft β€” temperament > talent

The meta-skill. The part no one teaches directly. This is the section most likely to make or break the transition.

Mindset shifts (engineer β†’ researcher)

Workflow discipline

⚠️ The traps checklists (run these before trusting any number)


70 Β· Tracks β€” the research portfolio

Run a portfolio, not one bet: a couple of short-horizon wins that de-risk delivery + one deeper line. Convert every negative result into a business decision ("windows past 30 min don't help β†’ stop investing there"). Template: 80_Track-Template.


80 Β· Templates


The 12-month transition roadmap

Phased, deliberately front-loading the reflexes over the results.

Q1 β€” Foundations & the lab habit (months 1–3)

Q2 β€” First real track + the skepticism reflex (months 4–6)

Q3 β€” Depth + the causal turn (months 7–9)

Q4 β€” Communicate & compound (months 10–12)


How I'll know it worked (transition scorecard)

Track quarterly. The goal isn't output volume β€” it's the change in reflex.

Signal Engineer-me Researcher-me
First reaction to a great result ship it what's the bug?
Train/test split by row by user & time
Definition of "done" tests pass I understand why the number is what it is
A negative result failure information (often more than a positive)
The business goal predict the outcome know whether I should predict or cause it
My infra maximal, elegant minimal, fast to iterate, reproducible

Maintenance notes

Created as the seed of a living knowledge base. Delete nothing; grow everything.