Top 3 MCQ on NLP, NLP quiz questions with answers, NLP MCQ questions, Solved questions in natural language processing, NLP practitioner exam questions, Un-smootherd MLE, TFIDF

Multiple Choice Questions in NLP - SET 18

1. Which of the following smoothing techniques is most complex?

a) Add-1 smoothing

b) Add-k smoothing

c) Witten-Bell smoothing

d) Good-Turing smoothing

Answer: (d) Good-Turing smoothing

Good-Turing smoothing – The basic ideas is to use total frequency of events that occur only once to estimate how much mass to shift to unseen events. Use the count of things which are seen once to help estimate the count of things never seen.

Witten-Bell smoothing - The probability of seeing a zero-frequency N-gram can be modeled by the probability of seeing an N-gram for the first time.

2. Which of the following smoothing techniques assigns too much probability to unseen events?

a) Add-1 smoothing

b) Add-k smoothing

c) Witten-Bell smoothing

d) Good-Turing

Answer: (a) Add-1 smoothing

Add-1 smoothing assumes every (seen or unseen) event occurred once more than it did in the training data. Add-1 moves too much probability mass from seen to unseen events.

Add-one smoothing thinks we are extremely likely to see novel events, rather than words we’ve seen.

3. In add-k smoothing method, for a small k value, what would be perplexity?

a) High perplexity

b) Zero perplexity

c) Low perplexity

d) Perplexity is not disturbed

Answer: (a) High perplexity

In Add-k smoothing, when k is small, unseen words have very small probability. it causes high perplexity.

Perplexity - The perplexity of a language model on a test set is the inverse probability of the test set, normalized by the number of words. It is used for evaluating the language models.

*************

Related links:

Go to Natural Language Processing home page

Go to Natural Language Processing - Glossary

Go to NLP - MCQ Quiz Home page

NLP quiz questions with answers explained

Online NLP quiz with solutions

question and answers in natural language processing

Language model smoothing

How does k value affects perplexity in add-k smoothing

Top 3 MCQ on NLP, NLP quiz questions with answers, NLP MCQ questions, Solved questions in natural language processing, NLP practitioner exam questions, Un-smootherd MLE, TFIDF

Multiple Choice Questions in NLP - SET 17

1. Using TF-IDF (Term Frequency - Inverse Document Frequency) values for features in a uni-gram bag-of-words model should have an effect most similar to which of the following?

a) Lowercasing the data

b) Dropout regularization

c) Removing stop words

d) Increasing the learning rate

Answer: (c) Removing stop words

TF-IDF is a statistical measure that evaluates how relevant a word is to a document in a collection of documents. This is done by multiplying two metrics: how many times a word appears in a document, and the inverse document frequency of the word across a set of documents.

When the metric word frequency of occurrence (TF) in a document is used as a feature value, a higher weight tends to be assigned to words that appear frequently in a corpus (such as stop-words). The inverse document frequency (IDF) is a better metric, because it assigns a lower weight to frequent words. You calculate IDF as the log of the ratio of the number of documents in the training corpus to the number of documents containing the given word. Combining these numbers in a metric (TF/IDF) places greater importance on words that are frequent in the document but rare in the corpus.

2. Suppose you have the following training data for Naïve Bayes:

I liked the dish [LABEL = POS]

I disliked the dish because it contains sugar [LABEL = NEG]

Really tasty dish [LABEL = POS]

What is the unsmoothed Maximum Likelihood Estimate (MLE) of P(POS) for this data?

a) 1/3

b) 1/2

c) 1

d) 2/3

Answer: (d) 2/3

P(POS) = (Number of training data labeled as POS)/(total no. of training data) = 2/3

* Un-smoothed MLE ignores the fact that some words are more frequent than the others in languages

3. Use the data given in question (2) to answer this question. What is the unsmoothed Maximum Likelihood Estimate (MLE) of P(dish |POS)?

a) 2/17

b) 1/5

c) 2/7

d) 1/2

Answer: (c) 2/7

7 words emitted in positive samples, 2 of them are 'dish'.

P(dish|POS) = (No. of times ‘dish’ appears in dataset labeled POS)/(total no. of words appear in the POS dataset) = 2/7.

*************

Related links:

Go to Natural Language Processing home page

Go to Natural Language Processing - Glossary

Go to NLP - MCQ Quiz Home page

NLP quiz questions with answers explained

Online NLP quiz with solutions

question and answers in natural language processing

Unsmoothed MLE and how to calculate?

TFIDF example solved exercise in natural language processing

TOPICS (Click to Navigate)

Thursday, December 17, 2020

Natural Language Processing (NLP) Multiple Choice Questions with answers 18

Top 3 MCQ on NLP, NLP quiz questions with answers, NLP MCQ questions, Solved questions in natural language processing, NLP practitioner exam questions, Un-smootherd MLE, TFIDF

Multiple Choice Questions in NLP - SET 18

1. Which of the following smoothing techniques is most complex?

Witten-Bell smoothing - The probability of seeing a zero-frequency N-gram can be modeled by the probability of seeing an N-gram for the first time.

2. Which of the following smoothing techniques assigns too much probability to unseen events?

Add-one smoothing thinks we are extremely likely to see novel events, rather than words we’ve seen.

In Add-k smoothing, when k is small, unseen words have very small probability. it causes high perplexity.

Go to Natural Language Processing home page

Go to Natural Language Processing - Glossary

Go to NLP - MCQ Quiz Home page

Top interview questions in NLP

NLP quiz questions with answers explained

Online NLP quiz with solutions

question and answers in natural language processing

Language model smoothing

How does k value affects perplexity in add-k smoothing

Sunday, November 29, 2020

Natural Language Processing (NLP) Multiple Choice Questions with answers 17

Top 3 MCQ on NLP, NLP quiz questions with answers, NLP MCQ questions, Solved questions in natural language processing, NLP practitioner exam questions, Un-smootherd MLE, TFIDF

Multiple Choice Questions in NLP - SET 17

TF-IDF is a statistical measure that evaluates how relevant a word is to a document in a collection of documents. This is done by multiplying two metrics: how many times a word appears in a document, and the inverse document frequency of the word across a set of documents.

Go to Natural Language Processing home page

Go to Natural Language Processing - Glossary

Go to NLP - MCQ Quiz Home page

Top interview questions in NLP

NLP quiz questions with answers explained

Online NLP quiz with solutions

question and answers in natural language processing

Unsmoothed MLE and how to calculate?

TFIDF example solved exercise in natural language processing

Featured Content

Multiple choice questions in Natural Language Processing Home

All time most popular contents

Report Abuse