Sequence modeling papers

growing#literature#reading-list

Tight starter set — the arc from Markov next-click → RNN sessions → self-attention. Feeds Prediction and the starter track. OA legend: ✅ open · 🟡 free author copy · 🔒 paywalled. Read with How I read a paper.

🎯 Start here: vaswani2017attention — every later model is a descendant; read it first and SASRec/BERT4Rec become legible. (Eng-background alternative: start at rendle2010fpmc, the Markov baseline that connects to Markov chains and HMMs.) Path: vaswani → rendle → hidasi → kang → sun → wang

⬜ P1 · Attention Is All You Need

Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, Kaiser & Polosukhin (2017) · NeurIPS · vaswani2017attention · ✅ yes 🔗 https://arxiv.org/abs/1706.03762 Why: The Transformer — multi-head self-attention + positional encodings. The substrate under every modern sequential model.

⬜ P2 · Factorizing Personalized Markov Chains for Next-Basket Recommendation (FPMC)

Rendle, Freudenthaler & Schmidt-Thieme (2010) · WWW 2010 · rendle2010fpmc · 🟡 author PDF 🔗 https://www.ismll.uni-hildesheim.de/pub/pdfs/RendleFreudenthaler2010-FPMC.pdf Why: The canonical Markov-chain baseline — combines a first-order transition matrix with matrix factorization. The thing every neural model is measured against.

⬜ P3 · Session-based Recommendations with Recurrent Neural Networks (GRU4Rec)

Hidasi, Karatzoglou, Baltrunas & Tikk (2016) · ICLR · hidasi2016gru4rec · ✅ yes 🔗 https://arxiv.org/abs/1511.06939 Why: Brought RNNs to anonymous short-session prediction. Extract the session-parallel mini-batch training and ranking loss — both became standard.

⬜ P4 · Self-Attentive Sequential Recommendation (SASRec)

Kang & McAuley (2018) · ICDM · kang2018sasrec · ✅ yes 🔗 https://arxiv.org/abs/1808.09781 Why: The inflection where self-attention replaced RNNs — a causal (unidirectional) Transformer over item sequences. Extract the masking + positional-embedding scheme.

⬜ P5 · BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer

Sun, Liu, Wu, Pei, Lin, Ou & Jiang (2019) · CIKM · sun2019bert4rec · ✅ yes 🔗 https://arxiv.org/abs/1904.06690 Why: Bidirectional (Cloze) masking. The contrast with SASRec's causal model sharpens your thinking on train-time vs inference-time leakage (Leakage checklist).

⬜ P6 · Sequential Recommender Systems: Challenges, Progress and Prospects

Wang, Hu, Wang, Cao, Sheng & Orgun (2019) · IJCAI survey track · wang2019sequential · ✅ yes 🔗 https://arxiv.org/abs/2001.04830 Why: The best single-document map — taxonomy (MC / RNN / attention / graph), datasets, evaluation conventions, open problems.