Showing posts with label natural language processing. Show all posts
Showing posts with label natural language processing. Show all posts

Sunday, October 17, 2021

Natural Language Processing (NLP) Multiple Choice Questions with answers 19

Top 3 MCQ on NLP, NLP quiz questions with answers, NLP MCQ questions, Solved questions in natural language processing, NLP practitioner exam questions, MLE, Smoothing, Laplace smoothing


Multiple Choice Questions in NLP - SET 19

1. Which of the following is not a problem when using Maximum Likelihood Estimation to obtain parameters in a language model?

a) Unreliable estimates where there is little training data

b) Out-of-vocabulary terms

c) Overfitting

d) Smoothing

Answer: (d) Smoothing

Options (a) to (c) are possible problems when we use MLE in a language model.

Relative frequency estimation assigns all probability mass to events in the training corpus. Smoothing is a technique to handle unknown words. It adjusts the maximum likelihood estimate of probabilities to produce more accurate probabilities.

 

2. Which of the following is the main advantage of neural transition-based dependency parsers over non-neural transition-based dependency parsers?

a) It chooses transitions using more words in the stack and buffer

b) It generates a larger class of dependency parses

c) It relies on dense feature representations

d) It models a grammar whereas traditional parsers do not

 

Answer: (c) Rely on dense feature representations

The main advantage of neural dependency parsers is that they offer a dense representation instead of a spare representation of the parser.

Neural and traditional parsers are not different in what input information they can use, or what kinds of parses they can output (both can output any parse), but they differ in their representation of the features they use. [Stanford question]

Dependency parsing is the process of analyzing the grammatical structure of a sentence based on the dependencies between the words in a sentence.

 

3. Which of the following equations is used to find the unigram probabilities using Add-1 smoothing?

a) Count (wi)/N

b) Count (wi)/(N+1)  

c) (Count (wi)+1)/(N+1)  

d) (Count (wi)+1)/(N+V)

 

Answer: (c) (Count (wi)+1)/(N+V)

Smoothing is a technique to handle unknown words. In Add-1 smoothing, we add 1 to the count of all n-grams in the training set before normalizing into probabilities.

As we have added 1 to the numerator (1 to the unigram count), we have to normalize that by adding the count of unique words with the denominator in order to normalize. Hence, V, the size of vocabulary is added in the denominator.

 

 
*************



Top interview questions in NLP

NLP quiz questions with answers explained

Add-1 smoothing for unigram model

question and answers in natural language processing

Language model smoothing

Maximum likelihood estimation and smoothing

Sunday, February 28, 2021

What is smoothing in NLP and why do we need it

What is smoothing in the context of natural language processing, define smoothing in NLP, what is the purpose of smoothing in nlp, is smoothing an important task in language model

Smoothing in NLP

Smoothing is the process of flattening a probability distribution implied by a language model so that all reasonable word sequences can occur with some probability. This often involves broadening the distribution by redistributing weight from high probability regions to zero probability regions.

Smoothing not only prevents zero probabilities, attempts to improves the accuracy of the model as a whole.

Why do we need smoothing?

In a language model, we use parameter estimation (MLE) on training data. We can’t actually evaluate our MLE models on unseen test data because both are likely to contain words/n-grams that these models assign zero probability to. Relative frequency estimation assigns all probability mass to events in the training corpus. But we need to reserve some probability mass to events that don’t occur (unseen events) in the training data.

Example:

Training data: The cow is an animal.

Test data: The dog is an animal.

If we use unigram model to train;

P(the) = count(the)/(Total number of words in training set) = 1/5.

Likewise, P(cow) = P(is) = P(an) = P(animal) = 1/5

To evaluate (test) the unigram model;

P(the cow is an animal) = P(the) * P(cow) * P(is) * P(an) * P(animal) = 0.00032

 

While we use unigram model on the test data, it becomes zero because P(dog) = 0. The term ‘dog’ never occurred in the training data. Hence, we use smoothing.

 

****************

Explain the concept of smoothing in NLP

Why do we need smoothing

What is the advantage of smoothing the data in language models


Related posts:


 

 

Friday, December 18, 2020

What is lemmatization in natural language processing

What is lemmatization in NLP? Define lemmatization, Lemmatization example

Lemmatization

In a language, usually a word is inflected to form new words, especially to mark the distinctions such as tense, person, number, gender, mood, voice, and case. In linguistics, lemmatization is the process of removing those inflections from a word in order to identify the lemma (dictionary form/word). A dictionary word (lemma / root word) is inflected into various words having same base meaning or different meanings by adding one or more morphemes (both free and bound). Through lemmatization, we remove the bound morphemes.

Lemmatization refers to doing things algorithmically with the use of a vocabulary and morphological analysis of words, aiming to remove inflections only and to return the base or dictionary form of a word, which is known as the lemma.

Inflected word Removal of morphemes Lemma

Example:

Inflected word

Morphemes

Lemma

Runs

‘s’

Run

Studies

‘ies’

Study

Opened

‘ed’

Open

 

******************

Go to Natural Language Processing Home page

 

Define lemmatization

What is lemmatization

What is lemma

Featured Content

Multiple choice questions in Natural Language Processing Home

MCQ in Natural Language Processing, Quiz questions with answers in NLP, Top interview questions in NLP with answers Multiple Choice Que...

All time most popular contents

data recovery