Multiple choices questions in NLP, Natural Language Processing solved MCQ, Bigram model, How to calculate the bigram probability using a corpus statistics? maximum likelihood estimate to find the bigram probability
Natural Language Processing MCQ  Bigram probability calculation  solved Exercise
1. Let us suppose that we are given a minicorpus consisting of four sentences. Here <s> and </s> are special symbols used to mark the beginning and end of the sentences respectively.
Arrange the sentences of
the corpus in ascending order of their probabilities as per bigram model.
s1: <s>Paris is beautiful</s>
s2: <s>Is Paris in Europe</s>
s3: <s>Paris is in France</s>
s4: <s>Paris is a city</s>

Paris 
is 
beautiful 
in 
Europe 
France 
city 
a 
</s> 
<s> 
0.1 
0.15 


0.17 




Paris 

0.15 

0.25 


0 


is 
0.1 

0.5 
0.15 



0.15 

beautiful 








0.25 
in 


0 

0.15 
0.75 



Europe 








0.2 
France 


0 



0.005 

0.25 
city 
0.04 
0.15 
0 
0.12 
0 
0.15 
0 

0.5 
a 
0 
0 
0.1 
0 
0 
0 
0.15 
0 
0 
a) s1, s2, s3, s4
b) s3, s2, s1, s4
c) s2, s4, s3, s1
d) s3, s2, s4, s1
Answer: (c) s2, s4, s3, s1 s1 : <s>Paris is beautiful</s> P(<s>Paris is beautiful</s>) = P(Paris<s>) * P(isParis) * P(beautifulis) * P(</s>beautiful) P(<s>Paris is
beautiful</s>) = 0.10 * 0.15 *
0.50 * 0.25 = 1.875 * 10^{3} = 0.001875
s2 : <s>Is Paris in Europe</s> P(<s>Is Paris in Europe</s>) = P(Is<s>) * P(ParisIs) * P(inParis) * P(Europein) * P(</s>Europe) P(<s>Is Paris in Europe</s>)
= 0.15 *
0.10 * 0.25 * 0.15 * 0.20 = 0.1125 * 10^{3} = 0.0001125
s3 : <s>Paris is in France</s> P(<s>Paris is in France</s>) =P(Paris<s>) * P(isParis) * P(inis) * P(Francein) * P(</s>France) P(<s>Paris is in France</s>) =
0.10 * 0.15 * 0.15 * 0.75 * 0.25 = 0.421875 * 10^{3} = 0.000421875
s4 : <s>Paris is a city</s> P(<s>Paris is a city</s>) = P(Paris<s>) * P(isParis) * P(ais) * P(citya) * P(</s>city) P(<s>Paris is a city</s>) =
0.10 * 0.15 * 0.15 * 0.15 * 0.50 = 0.16875 * 10^{3} = 0.00016875 