Multiple choices questions in NLP, Natural Language Processing solved MCQ, Bigram model, How to calculate the bigram probability using a corpus statistics?

## Natural Language Processing MCQ - Bigram probability calculation - solved Exercise

1. Let us suppose that we are given a mini-corpus consisting of four sentences. Here <s> and </s> are special symbols used to mark the beginning and end of the sentences respectively.

## Arrange the sentences of the corpus in ascending order of their probabilities as per bigram model.

s1: <s>Paris is beautiful</s>

s2: <s>Is Paris in Europe</s>

s3: <s>Paris is in France</s>

s4: <s>Paris is a city</s>

 Paris is beautiful in Europe France city a 0.1 0.15 0.17 Paris 0.15 0.25 0 is 0.1 0.5 0.15 0.15 beautiful 0.25 in 0 0.15 0.75 Europe 0.2 France 0 0.005 0.25 city 0.04 0.15 0 0.12 0 0.15 0 0.5 a 0 0 0.1 0 0 0 0.15 0 0

a)   s1, s2, s3, s4

b)   s3, s2, s1, s4

c)    s2, s4, s3, s1

d)  s3, s2, s4, s1

 Answer: (c) s2, s4, s3, s1 s1 : Paris is beautiful P(Paris is beautiful) = P(Paris|) * P(is|Paris) * P(beautiful|is) * P(|beautiful) P(Paris is beautiful) = 0.10 *  0.15 * 0.50 * 0.25 =  1.875 * 10-3 = 0.001875  s2 : Is Paris in Europe P(Is Paris in Europe) = P(Is|) * P(Paris|Is) * P(in|Paris) * P(Europe|in) * P(|Europe) P(Is Paris in Europe) =  0.15 *  0.10 * 0.25 * 0.15 * 0.20 = 0.1125 * 10-3 = 0.0001125  s3 :  Paris is in France P(Paris is in France) =P(Paris|) * P(is|Paris) * P(in|is) * P(France|in) * P(|France) P(Paris is in France) = 0.10 * 0.15 * 0.15 * 0.75 * 0.25 = 0.421875 * 10-3 = 0.000421875  s4 : Paris is a city P(Paris is a city) = P(Paris|) * P(is|Paris) * P(a|is) *  P(city|a) * P(|city) P(Paris is a city) = 0.10 * 0.15 * 0.15 * 0.15 * 0.50 = 0.16875 * 10-3 = 0.00016875

