Markov chains and HMMs

evergreen#foundations#sequences

Up: 10_Foundations-MOC

Bar: write a transition matrix, solve for its stationary distribution by hand, and explain the forward and Viterbi recursions in one line each.

Markov chains

Markov property: the future depends on the present only — P(xt+1xt,xt1,)=P(xt+1xt)P(x_{t+1}\mid x_t,x_{t-1},\dots)=P(x_{t+1}\mid x_t). Encode it as a transition matrix PP where Pij=P(next=jnow=i)P_{ij}=P(\text{next}=j\mid\text{now}=i); rows sum to 1.

Worked: stationary distribution (by hand)

P=[0.90.10.50.5]P=\begin{bmatrix}0.9&0.1\\0.5&0.5\end{bmatrix}

Solve π1=0.9π1+0.5π2\pi_1 = 0.9\pi_1 + 0.5\pi_2 with π2=1π1\pi_2=1-\pi_1:

π1=0.9π1+0.5(1π1)  0.6π1=0.5  π1=560.833, π2=16.\pi_1 = 0.9\pi_1 + 0.5(1-\pi_1)\ \Rightarrow\ 0.6\pi_1 = 0.5\ \Rightarrow\ \pi_1=\tfrac56\approx0.833,\ \pi_2=\tfrac16.
Long-run, the system spends 5/6 of its time in state 1. The matrix is the ground truth you plant when generating synthetic page-transition data (Generating background traffic).

Hidden Markov Models

Now the state is hidden; you see only emissions. Parameters λ=(A,B,π)\lambda=(A,B,\boldsymbol\pi):

Three canonical problems:

Problem Question Algorithm
Likelihood P(Oλ)P(O\mid\lambda) of an observation sequence? Forward (sum)
Decoding most likely hidden path? Viterbi (max)
Learning best λ\lambda from data? Baum–Welch (EM)

Forward vs Viterbi (the only difference is sum vs max)

Forward: αt(j)=[iαt1(i)Aij]bj(ot)\text{Forward: } \alpha_t(j)=\Big[\sum_i \alpha_{t-1}(i)\,A_{ij}\Big]\,b_j(o_t)
Viterbi:  δt(j)=[maxiδt1(i)Aij]bj(ot)  (+ backpointers)\text{Viterbi: } \ \delta_t(j)=\Big[\max_i \delta_{t-1}(i)\,A_{ij}\Big]\,b_j(o_t)\ \ (+\text{ backpointers})

Forward sums over all paths to get total probability; Viterbi maximizes to recover the single best path (then traces backpointers). Both are O(TS2)O(T\cdot S^2) dynamic programming — linear in sequence length, the reason HMMs are practical.

Worked: one forward step

Hidden {R,S}\{R,S\}, π=(0.6,0.4)\boldsymbol\pi=(0.6,0.4), emission of "walk": bR(walk)=0.1, bS(walk)=0.6b_R(\text{walk})=0.1,\ b_S(\text{walk})=0.6.

α1(R)=0.60.1=0.06,α1(S)=0.40.6=0.24.\alpha_1(R)=0.6\cdot0.1=0.06,\qquad \alpha_1(S)=0.4\cdot0.6=0.24.
P(o1=walk)=0.06+0.24=0.30P(o_1=\text{walk})=0.06+0.24=0.30. Extend with the recursion for α2\alpha_2, etc.

Why this matters here

By-hand exercise (meets the bar)

  1. Find the stationary distribution of P=[0.70.30.40.6]P=\begin{bmatrix}0.7&0.3\\0.4&0.6\end{bmatrix}. (Answer: π=(4/7,3/7)\pi=(4/7,3/7).)
  2. Add a second emission and compute α2\alpha_2 for the HMM above for the sequence (walk, shop).