Multiple choices questions in NLP, Natural Language Processing solved MCQ, Bigram model, How to calculate the bigram probability using a corpus statistics? maximum likelihood estimate to find the bigram probability
Natural Language Processing MCQ - Bigram probability calculation - solved Exercise
1. Let us suppose that we are given a mini-corpus consisting of four sentences. Here <s> and </s> are special symbols used to mark the beginning and end of the sentences respectively.
Arrange the sentences of
the corpus in ascending order of their probabilities as per bigram model.
s1: <s>Paris is beautiful</s>
s2: <s>Is Paris in Europe</s>
s3: <s>Paris is in France</s>
s4: <s>Paris is a city</s>
|
Paris |
is |
beautiful |
in |
Europe |
France |
city |
a |
</s> |
<s> |
0.1 |
0.15 |
|
|
0.17 |
|
|
|
|
Paris |
|
0.15 |
|
0.25 |
|
|
0 |
|
|
is |
0.1 |
|
0.5 |
0.15 |
|
|
|
0.15 |
|
beautiful |
|
|
|
|
|
|
|
|
0.25 |
in |
|
|
0 |
|
0.15 |
0.75 |
|
|
|
Europe |
|
|
|
|
|
|
|
|
0.2 |
France |
|
|
0 |
|
|
|
0.005 |
|
0.25 |
city |
0.04 |
0.15 |
0 |
0.12 |
0 |
0.15 |
0 |
|
0.5 |
a |
0 |
0 |
0.1 |
0 |
0 |
0 |
0.15 |
0 |
0 |
a) s1, s2, s3, s4
b) s3, s2, s1, s4
c) s2, s4, s3, s1
d) s3, s2, s4, s1
Answer: (c) s2, s4, s3, s1 s1 : <s>Paris is beautiful</s> P(<s>Paris is beautiful</s>) = P(Paris|<s>) * P(is|Paris) * P(beautiful|is) * P(</s>|beautiful) P(<s>Paris is
beautiful</s>) = 0.10 * 0.15 *
0.50 * 0.25 = 1.875 * 10-3 = 0.001875
s2 : <s>Is Paris in Europe</s> P(<s>Is Paris in Europe</s>) = P(Is|<s>) * P(Paris|Is) * P(in|Paris) * P(Europe|in) * P(</s>|Europe) P(<s>Is Paris in Europe</s>)
= 0.15 *
0.10 * 0.25 * 0.15 * 0.20 = 0.1125 * 10-3 = 0.0001125
s3 : <s>Paris is in France</s> P(<s>Paris is in France</s>) =P(Paris|<s>) * P(is|Paris) * P(in|is) * P(France|in) * P(</s>|France) P(<s>Paris is in France</s>) =
0.10 * 0.15 * 0.15 * 0.75 * 0.25 = 0.421875 * 10-3 = 0.000421875
s4 : <s>Paris is a city</s> P(<s>Paris is a city</s>) = P(Paris|<s>) * P(is|Paris) * P(a|is) * P(city|a) * P(</s>|city) P(<s>Paris is a city</s>) =
0.10 * 0.15 * 0.15 * 0.15 * 0.50 = 0.16875 * 10-3 = 0.00016875 |