Top
3 MCQ on NLP, NLP quiz questions with answers, NLP MCQ questions,
Solved questions in natural language processing, NLP practitioner exam
questions, Un-smootherd MLE, TFIDF
Multiple Choice Questions in NLP - SET 17
1. Using TF-IDF (Term Frequency - Inverse Document Frequency) values for features in a uni-gram bag-of-words model should have an effect most similar to which of the following?
a) Lowercasing the data
b) Dropout regularization
c) Removing stop words
d) Increasing the learning rate
Answer: (c) Removing stop words TF-IDF is a statistical measure that evaluates how relevant a word is to a document in a collection of documents. This is done by multiplying two metrics: how many times a word appears in a document, and the inverse document frequency of the word across a set of documents.When the metric word frequency of occurrence (TF) in a document is used as a feature value, a higher weight tends to be assigned to words that appear frequently in a corpus (such as stop-words). The inverse document frequency (IDF) is a better metric, because it assigns a lower weight to frequent words. You calculate IDF as the log of the ratio of the number of documents in the training corpus to the number of documents containing the given word. Combining these numbers in a metric (TF/IDF) places greater importance on words that are frequent in the document but rare in the corpus. |
2. Suppose you have the following training data for Naïve Bayes:
I liked the dish [LABEL = POS]
I disliked the dish because it contains sugar [LABEL = NEG]
Really tasty dish [LABEL = POS]
What is the unsmoothed Maximum Likelihood Estimate (MLE) of P(POS) for this data?
a) 1/3
b) 1/2
c) 1
d) 2/3
Answer: (d) 2/3 P(POS) = (Number of training data labeled as POS)/(total no. of training data) = 2/3 * Un-smoothed MLE ignores the fact that some words are more frequent than the others in
languages |
3. Use the data given in question (2) to answer this question. What is the unsmoothed Maximum Likelihood Estimate (MLE) of P(dish |POS)?
a) 2/17
b) 1/5
c) 2/7
d) 1/2
Answer: (c) 2/7 7 words
emitted in positive samples, 2 of them are 'dish'. P(dish|POS) = (No. of times ‘dish’ appears in dataset labeled POS)/(total no. of words appear in the POS dataset) = 2/7. |
No comments:
Post a Comment