Please visit, subscribe and share 10 Minutes Lectures in Computer Science
Showing posts with label natural language processing. Show all posts
Showing posts with label natural language processing. Show all posts

# Multiple choices questions in NLP, Natural Language Processing solved MCQ, What is perplexity, how to calculate perplexity, evaluating language model, intrinsic vs extrinsic evaluation

## Natural Language Processing MCQ - Find the perplexity of a language model

1. Consider the following corpus:

S1: You have five minutes remaining till the end of the test

S2: You have submitted the test

S3: You are given five marks for the correct answer

Let us suppose that the sentence S2 of the given corpus is the test. What is the perplexity of the test? Assume the bigram language model is being used.

(a) 1.14

(b) 1.42

(c) 1.35

(d) 1.43

 Answer: (c) 1.35 Let us find the probability of S2; P(S2) = P(“ You have submitted the test ”) = P(You|) * P(have|You) * P(submitted|have) * P(the|submitted) * P(test|the) * P(|test). P(You|) = Count(, You)/Count() = 3/3 = 1. [Hint: Numerator: the bigram “ You” occurs 3 times in the corpus. That is “You” starts two sentences as per the given corpus. Denominator: The unigram “” (start symbol) occurs 3 times in the corpus.] P(have|You) = 2/3 P(submitted|have) = 1/2 P(the|submitted) = 1/1 = 1 P(test|the) = 2/4 = 1/2 P(|test) = 2/2 = 1 P(S2) = 1 * (2/3) * (1/2) * 1 * (1/2) * 1 = 1/6  What is perplexity?Perplexity is the inverse probability of test set, normalized by the number of words. It is an intrinsic evaluation method. [Refer here for more.]  Perplexity (PP) of the test sentence can be measured using the following equation; PP(W) = P(w1w2w3…wN)-1/N w1, w2, … are the words in the test set and N is the total count of word tokens in the test set including (but excluding ). PP(S2) = (1/6)-1/6 = 1/(1/6)1/6 = 1.35 approx.   For more on perplexity, please refer this link. Reason for including begin and end sentence markers and Since this sequence will cross many sentence boundaries, we need to include the begin- and end-sentence markers and in the probability computation. We also need to include the end-of-sentence marker (but not the beginning-of-sentence marker ) in the total count of word tokens N. Source: Speech and Language Processing by Daniel Jurafsky and James H. Martin

 Next >

******************

• ### Go to Natural Language Processing - Glossary

#### What is perplexity

How to calculate perplexity

Main difference between intrinsic evaluation and extrinsic evaluation

## Featured Content

### Multiple choice questions in Natural Language Processing Home

MCQ in Natural Language Processing, Quiz questions with answers in NLP, Top interview questions in NLP with answers Multiple Choice Que...

data recovery