How to build POS tagging with bigram Hidden Markov Model? Types of POS tagger techniques, HMM forward algorithm for POS tagging problem
PartOfSpeech Tagging with Hidden Markov Model
What is POS tagging?
PartOfSpeech
(POS) tagging is the process of attaching each word in an input text with
appropriate POS tags like Noun, Verb, Adjective etc. This task is considered as
one of the disambiguation tasks in NLP. The reason is, many words in a language
may have more than one partofspeech. You can understand if from the following
table;
Words

POS tags

Bank

Noun, Verb

Easy

Adjective, Adverb

Process

Noun, Verb

POS tagging is also
called as grammatical tagging or wordcategory disambiguation.
Why POS tagging?
It is one of the
subproblems in many NLP applications. For example,
 Machine translation  We need to identify the correct POS tags of input sentence to translate it correctly into another language.
 Word sense disambiguation – Identifying the correct word category would help in improving the sense disambiguation task which is to identify the correct meaning of word.
 Named entity recognition – It is to identify and classify the named entities mentioned in a text. POS tagging is one of the preprocessing steps in here.
Also, there are
many other NLP applications that use POS tagging as one of the preliminary
steps.
Different groups of POS tagging techniques
 Rule based POS taggers
 Stochastic (Probabilistic) POS taggers
Hidden Markov Model in POS tagging
HMM is a probabilistic
sequence model. POS tagging is one of the sequence labeling problems. A
sequence model assigns a label to each component in a sequence. It computes a
probability distribution over possible sequences of labels and chooses the best
label sequence. POS tagging is a sequence labeling problem because we need to
identify and assign each word the correct POS tag.
A hidden Markov
model (HMM) allows us to talk about both observed events (words in the input
sentence) and hidden events (POS tags) unlike Markov chains (which talks about
the probabilities of state sequence which is not hidden).
Two important assumptions used by HMM
HMM uses two
assumptions for simplifying the calculations. They are;
 Markov assumption: the probability of a state q_{n} (POS tag in tagging problem which are hidden) depends only on the previous state q_{n1} (POS tag).
P(q_{n}  q_{1}, q_{2}, …, q_{n1})
= P(q_{n}  q_{n1})
 Output independence: the probability of an observation o_{n} (words in tagging problem) depends only on the state q_{n} (hidden state) that produced the observation and not on any other states or observations.
P(o_{n}  q_{1},…, q_{n}, …q_{T},
o_{1}, … o_{i}, … o_{T}) = P(o_{n}  q_{n})
Likelihood of the observation sequence (Forward Algorithm)
Let us consider W
as word sequence (observation/emission) and T as tag sequence (hidden
states). W consists of a sequence of observations (words) w_{1},
w_{2}, … w_{n}, and T consists of a sequence
of hidden states (POS tags) t_{1}, t_{2},
… t_{n}. Then the joint probability, ie., P(W, T)
(also called as likelihood estimation) can
be calculated using the two assumptions discussed above as follows;
Observation (word
sequence) probabilities as per the output independence assumption – HOW?
State transition (POS
tags) probabilities as per Bigram assumption on tags (probability of a tag
depends only on its previous tags) – HOW?
So, Eq 1
can be expanded as follows with observation probabilities followed by
transition probabilities;
P(WT) * P(T) = P(w_{1}t_{1})
* P(w_{2}t_{2}) * … * P(w_{n}t_{n}) * P(t_{1}t_{0})
* P(t_{2}t_{1})
*
… * P(t_{n}t_{n1}).
Likelihood estimation  Example:
Question:
Given the HMM Î» =
(A, B, Ï€) and
word sequence W = “the light book”, find P(the light book  Det JJ NN), the probability
of the word sequence (observation) given the tag sequence.
Solution:
Given are initial
state probabilities (Ï€),
state transition probabilities (A), and observation probabilities (B).
P(the light book  DT
JJ NN)
= P(theDT) *
P(lightJJ) * P(bookNN) * P(DTstart) * P(JJDT) * P(NNJJ)
= 0.3 * 0.002 *
0.003 * 0.45 * 0.3 * 0.2
= 0.0000000486
= 4.86 * 10^{8}
******************
Go to NLP Glossary
Go to Natural Language Processing Home page