Multiple choices questions in NLP, Natural Language Processing solved MCQ, What is perplexity, how to calculate perplexity, evaluating language model, intrinsic vs extrinsic evaluation
Natural Language Processing MCQ - Find the perplexity of a language model
1. Consider the following corpus:
S1: You have five minutes remaining till the end of the test
S2: You have submitted the test
S3: You are given five marks for the correct answer
Let us suppose that the sentence S2 of the given corpus is the test. What is the perplexity of the test? Assume the bigram language model is being used.
(a) 1.14
(b) 1.42
(c) 1.35
(d) 1.43
Answer: (c) 1.35
Let us find the probability of S2; P(S2) = P(“<s> You have submitted the test </s>”) = P(You|<s>) * P(have|You) * P(submitted|have) * P(the|submitted) * P(test|the) * P(</s>|test). P(You|<s>) = Count(<s>, You)/Count(<s>) = 3/3 = 1. [Hint: Numerator: the bigram “<s> You” occurs 3 times in the corpus. That is “You” starts two sentences as per the given corpus. Denominator: The unigram “<s>” (start symbol) occurs 3 times in the corpus.] P(have|You) = 2/3 P(submitted|have) = 1/2 P(the|submitted) = 1/1 = 1 P(test|the) = 2/4 = 1/2 P(</s>|test) = 2/2 = 1 P(S2) = 1 * (2/3) * (1/2) * 1 * (1/2) * 1 = 1/6
What is perplexity? Perplexity is the inverse probability of test set, normalized by the number of words. It is an intrinsic evaluation method. [Refer here for more.]
Perplexity (PP) of the test sentence can be measured using the following equation; PP(W) = P(w1w2w3…wN)-1/N w1, w2, … are the words in the test set and N is the total count of word tokens in the test set including </s> (but excluding <s>). PP(S2) = (1/6)-1/6 = 1/(1/6)1/6 = 1.35 approx. For more on perplexity, please refer this link. Reason for including begin and end sentence markers <s> and </s> Since this sequence will cross many sentence boundaries, we need to include the begin- and end-sentence markers <s> and </s> in the probability computation. We also need to include the end-of-sentence marker </s> (but not the beginning-of-sentence marker <s>) in the total count of word tokens N. Source: Speech and Language Processing by Daniel Jurafsky and James H. Martin |
Top interview questions in NLP
NLP quiz questions with answers explained
What is perplexity
How to calculate perplexity
Main difference between intrinsic evaluation and extrinsic evaluation