Define formally the HMM, Hidden Markov Model and its usage in Natural language processing, Example HMM, Formal definition of HMM
Hidden Markov Model
Hidden
Markov Model (HMM) is a simple sequence labeling model. It is a statistical
Markov model in which the system being modeled is assumed to be a Markov
process with unobserved (i.e. hidden) states. By relating the observed events (Example  words in a sentence) with the
hidden states (Example  part of speech
tags), it helps us in finding the most probable hidden state sequence (Example – most relevant POS tag sequence for
the given input sentence).
HMM
can be defined formally as a 5tuple (Q, A, O, B, π)
where each component can be defined as follows;
Component

Detailed components

Description

Q

q_{1}, q_{2},
q_{3}, …, q_{N}

Set of N hidden states

A

a_{11}, a_{12},
…, a_{nn}


O

o_{1}, o_{2},
…, o_{T}

A sequence of T
observations

B

b_{i}(o_{t})


π

π_{1},
π_{2}, …, π_{N}


Understanding Hidden Markov Model  Example:
These
components are explained with the following HMM. In this example, the states
are related to the weather conditions (Hot, Wet, Cold) and observations are
related to the fabrics that we wear (Cotton, Nylon, Wool).
As
per the given HMM,
 Q = set of states = {Hot, Wet, Cold}
 A = transition probability matrix
o
Transition probability matrix
Current state


Previous state

Hot

Wet

Cold


Hot

0.6

0.3

0.1


Wet

0.4

0.4

0.2


Cold

0.1

0.4

0.5

o
How to read this matrix? In this matrix,
for example, a_{ij} is a transition probability from state i to state j
[which
is represented as conditional probability P(ji)];
a_{ij}
= a_{11} = P(HotHot) = 0.6
a_{ij}
= a_{23} = P(ColdWet) = 0.2
a_{ij}
= a_{31} = P(HotCold) = 0.1
o
Sum of transition probability from a single
state to all the other states = 1. In other words, we would say that the total
weights of arcs (or edges) going out of a state should be equal to 1. In our
example;
P(HotHot)+P(WetHot)+P(ColdHot)
= 0.6+0.3+0.1 = 1
 O = sequence of observations = {Cotton, Nylon, Wool}
 B = Emission probability matrix
o
Emission probability matrix
Cotton

Nylon

Wool


Hot

0.8

0.5

0.05

Wet

0.15

0.4

0.2

Cold

0.05

0.1

0.75

o
The above said matrix consists of emission
probability values represented as b_{i}(o_{t}). b_{i}(o_{t})
is the probability of an observation o_{t} generated from a state b_{i}.
For example, P(Nylon  Hot) = 0.5,
P(Wool  Cold) = 0.75 etc.
 π = [π_{1}, π_{2}, …, π_{N}] = set of prior probabilities = [0.6, 0.3, 0.1]. Here, the values refer to the prior probabilities P(Hot) = 0.6, P(Wet) = 0.3, and P(Cold) = 0.1
**********
Go to NLP Glossary