Advanced Database Management System - Tutorials and Notes: Evaluation problem of Hidden Markov Model

## Evaluation problem of Hidden Markov Model

### Evaluation problem (Likelihood):

Given an HMM λ = (A, B, π) and an observation sequence O = o1, o2, …, oT, how do we compute the probability that the observed sequence was produced by the model. In other words, it is about determining the likelihood of the observation sequence.

### Explain evaluation problem of HMM with example

It is just about how to calculate the probability of the observation sequence given the model. The calculation depends on the data given to us. For instance, if you are given observation sequence O = o1, o2, …, oT and
P(O|Q) = P(drink water | VB NN) = P(VB|start) * P(drink|VB) * P(NN|VB) * P(water|NN)
• no particular state sequence, then you need to find the same as above and do the sum over all possible hidden state sequences. For example, to get the P(drink water) without any particular hidden tag sequence, we need to compute the likelihood for each possibility and sum all the values as given below. This is because both the words “drink and “water are listed under the POS categories Noun and Verb.
P(drink water) = P(drink water | VB NN) + P(drink water | VB VB) + P(drink water | NN VB) + P(drink water | NN NN)
This way we can compute the probability for all hidden sequences to find the probability of the given observation sequence.

### What is the drawback in Evaluation (likelihood) problem of HMM?

How about in general for an HMM with N hidden states and a sequence of T observations? The above said calculations have to be done for NT different possibilities. Let us do simple calculations to understand the complexity;
• If number of tags N = 2, and number of words observed T = 2, then 22 = 4 possible likelihood estimates.
• If N = 6 and T = 4, then 64 = 1296 possible likelihood estimates.
• For longer sentences, it will be too high.

### What is the solution to handle Evaluation problem of HMM?

• Forward-Backward procedure

***********