How to calculate transition probabilities in HMM using MLE? Calculate emission probabilities in HMM using MLE from a corpus, How to count and measure MLE from a corpus?
Question:
Given the following tagged corpus as the training corpus, answer the following questions using Maximum Likelihood Estimation (MLE);
Training
corpus:
But/CC then/RB the/DT bear/NN thought/VBD that/IN the/DT fish/NN was/VBD too/RB small/JJ to/TO fill/VB the/DT stomach/NN of/IN bear/NN. He/PRP decided/VBD to/TO catch/VB a/DT bigger/JJR fish/NN. He/PRP let/VBD off/RP the/DT small/JJ fish/NN and/CC waited/VBD for/IN some/DT time/NN. Again/RB a/DT small/JJ fish/NN came/VBD and/CC he/PRP let/VBP it/PRP go/VB thinking/VBG that/IN the/DT small/JJ fish/NN would/MD not/RB fill/VB his/PRP$ belly/NN. This/DT way/NN he/PRP caught/VBD many/JJ small/JJ fish/NN, but/CC let/VB all/DT of/IN them/PRP go/VB off/RP. By/IN sunset/NN, the/DT bear/NN had/VBD not/RP caught/VBN any/DT big/JJ fish/NN.
Tags used in this corpus


CC
– Conjunction
DT
– Determiner
IN
 Preposition
JJ
– Adjective
JJR
– Adjective comparative
MD
 Modal
NN
– Noun
PRP
– Personal pronoun

PRP$
 Possessive pronoun
RB
– Adverb
RP
 Particle
TO
 To
VB
 Verb
VBD
– Verb past tense
VBN
– Verb past participle
VBP
– Verb non3^{rd }person singular present

(a) Find the tag
transition probabilities using MLE for the following.
(i) P(JJDT) (ii) P(VBTO) (iii) P(NNDT, JJ)
(b) Find the emission
probabilities for the following;
(i) P(goVB) (ii) P(fishNN)
Answer:
(a) We can compute the maximum likelihood estimate of bigram and trigram transition probabilities as follows;
In Equation (1),
 P(t_{i}t_{i1}) – Probability of a tag t_{i} given the previous tag t_{i1}.
 C(t_{i1}, t_{i}) – Count of the tag sequence “t_{i1} t_{i}” in the corpus. That is, how many times tag t_{i} follows the tag t_{i1} in the corpus.
 C(t_{i1}) – Count of occurrence of tag t_{i1} in the corpus. That is, frequency of the tag t_{i1} in the corpus.
In Equation (2),
 P(t_{i}t_{i1}, t_{i2}) – Probability of a tag t_{i} given the previous two tag t_{i1}, and t_{i2}.
 C(t_{i2}, t_{i1}, t_{i}) – Count of the tag sequence “t_{i2} t_{i1} t_{i}” in the corpus. That is, how many times tag t_{i} follows the couple of tags t_{i2 }and_{ }t_{i1} in the corpus.
 C(t_{i2}, t_{i1}) – Count of occurrence of tag sequence “t_{i2 }t_{i1}” in the corpus.
Solution
to exercise a(i):
Find the probability of tag JJ given the previous tag DT using MLE
To find P(JJ  DT), we can apply
Equation (1) to find the bigram probability using MLE.
In the corpus, the
tag DT occurs 12 times out of which 4 times it is followed by the tag JJ.
Solution
to exercise a(ii):
Find the probability of tag VB given the previous tag TO using MLE
To find P(VB  TO).
We can apply Equation (1) to find the bigram
probability using MLE.
In the corpus, the
tag TO occurs 2 times out of which 2 times it is followed by the tag VB.
Solution
to exercise a(iii):
Find the probability of tag NN given previous two tags DT and JJ using MLE
To find P(NN  DT JJ),
we can apply Equation (2) to find the trigram probability
using MLE.
In the corpus, the
tag sequence “DT JJ” occurs 4 times out of which 4 times it is followed by the
tag NN.
(B) We can compute the Maximum Likelihood Estimate of emission probability as follows;
In Equation (3),
 P(w_{i}t_{i}) – Probability of a word w_{i} given the tag t_{i} which is associated with the word.
 C(t_{i}, w_{i}) – Count of occurrence of word w_{i} with associated tag t_{i} in the corpus. C(t_{i}) – Count of occurrence of tag t_{i} in the corpus.
Solution
to exercise b(i):
Find the Maximum Likelihood Estimate of emission probability P(goVB)
To find the MLE of
emission probability P(go  VB), we can apply Equation (3) as follows;
In the corpus, the
tag VB occurs 6 times out of which VB associated with the word “go”
2 times. [How to read P(go  VB)? – If we are
going to generate a tag VB, how likely it will be associated with the word go]
Solution
to exercise b(ii):
Find the Maximum Likelihood Estimate of emission probability P(fishNN)
To find the MLE of
emission probability P(fish  NN), we can apply Equation (3) as follows;
In the corpus, the
tag VB occurs 6 times out of which VB associated with the word “go”
2 times. [How to read P(go  VB)? – If we are
going to generate a tag VB, how likely it will be associated with the word go]
**********
Go to Hidden Markov Model Formal Definition page