Probability for sequences

evergreen#foundations#probability

Up: 10_Foundations-MOC

Bar: factor a joint with the chain rule, apply Bayes to a base-rate problem, and state the difference between MLE and MAP — all by hand.

The core objects

Independence vs conditional independence

Bayes — and the base-rate trap (worked)

P(HE)=P(EH)P(H)P(E),P(E)=hP(Eh)P(h)P(H\mid E)=\frac{P(E\mid H)\,P(H)}{P(E)},\quad P(E)=\sum_h P(E\mid h)P(h)

Bot detection. Prior P(bot)=0.01P(\text{bot})=0.01. A bot fires "fast" requests with P(fastbot)=0.9P(\text{fast}\mid\text{bot})=0.9; humans rarely, P(fasthuman)=0.05P(\text{fast}\mid\text{human})=0.05. Saw "fast" — is it a bot?

P(botfast)=0.90.010.90.01+0.050.99=0.0090.009+0.0495=0.0090.05850.154P(\text{bot}\mid\text{fast})=\frac{0.9\cdot0.01}{0.9\cdot0.01+0.05\cdot0.99} =\frac{0.009}{0.009+0.0495}=\frac{0.009}{0.0585}\approx 0.154
Even a "90% accurate" signal is wrong 85% of the time here, because bots are rare. Base rates dominate — the same arithmetic governs rare-conversion and fraud problems (Evaluation theory, Bots, fraud, and invalid traffic).

MLE vs MAP (worked)

Estimate a 3-page transition distribution from counts c=(nA,nB,nC)\mathbf{c}=(n_A,n_B,n_C), total NN.

By-hand exercise (meets the bar)

  1. Redo the bot example with prior 0.10.1 instead of 0.010.01. (Answer: 0.667\approx 0.667 — base rate is the lever.)
  2. Factor P(x1,x2,x3)P(x_1,x_2,x_3) under a first-order Markov assumption and count how many parameters a full joint needs vs the Markov version for a 5-symbol alphabet, length-3 sequences.