Saturday, December 5, 2020

Machine Learning TRUE or FALSE Questions with Answers 18

Machine learning exam questions, ML solved quiz questions, Machine Learning TRUE or FALSE questions, TOP 5 machine learning quiz questions with answers

Machine Learning TRUE / FALSE Questions - SET 18

1. For linearly separable data, can a small slack penalty (“C") hurt the training accuracy when using a linear SVM without kernel.

(a) TRUE                                                   (b) FALSE

Answer: TRUE

If the optimal values of α's (say in the dual formulation) are greater than C, we may end up with a sub-optimal decision boundary with respect to the training examples. Alternatively, a small C can allow large slacks, thus the resulting classifier will have a small value of w2 but can have non-zero training error.

 

C is a regularization parameter that controls the trade-off between the achieving a low training error and a low testing error that is the ability to generalize your classifier to unseen data. If your C is too small then you give your objective function a certain freedom to increase |w| a lot, which will lead to large training error.

C Parameter is used for controlling the outliers — low C implies we are allowing more outliers, high C implies we are allowing fewer outliers.

 

2. Ridge regression, weight decay, and Gaussian processes use the same regularizer.

(a) TRUE                                                   (b) FALSE

Answer: TRUE

Ridge regression, weight decay, and Gaussian processes use the same regularizer ǁwǁ2.

Regularization

In the context of machine learning, regularization is the process which regularizes or shrinks the coefficients towards zero. In simple words, regularization discourages learning a more complex or flexible model, to prevent overfitting. [For more, refer here please]

Regularization may be defined as any change we make to the training algorithm in order to reduce the generalization error but not the training error.

Ridge regression is like least-square regression with an additional penalty term ǁwǁ2.

Weight decay means decreasing the weights at every learning step.

A Gaussian process is a generative model in which the weights of the target function are drawn according to a Gaussian distribution (for a linear model).

 

3. Linear soft-margin SVM can only be used when training data are linearly separable.

(a) TRUE                                                   (b) FALSE

Answer: FALSE

Hard margin SVM can work only when data is completely linearly separable without any errors (noise or outliers). In case of errors either the margin is smaller or hard margin SVM fails. On the other hand soft margin SVM was proposed to solve this problem by introducing slack variables. It is an extended version of hard-margin SVM

 

4. In linear regression, using an L2 regularization penalty term results in sparser solutions than using an L1 regularization penalty term.

(a) TRUE                                                   (b) FALSE

Answer: FALSE

In linear regression, using an L1 regularization penalty term results in sparser solutions than using an L2 regularization penalty term.

 

L1 regularization adds an L1 penalty equal to the absolute value of the magnitude of coefficients. In other words, it limits the size of the coefficients. L1 can yield sparse models (i.e. models with few coefficients).

L2 regularization adds an L2 penalty equal to the square of the magnitude of coefficients. L2 will not yield sparse models and all coefficients are shrunk by the same factor. [For more, please refer here] 

 

5. Maximum likelihood estimation gives us not only a point estimate, but a distribution over the parameters that we are estimating.

(a) TRUE                                                   (b) FALSE

Answer: FALSE

In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of a probability distribution by maximizing a likelihood function, so that under the assumed statistical model the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate. [Refer here]

MLE is a method of estimating the parameters of a statistical model by picking the parameters that maximize the likelihood function.

 

*********************

Related links:

 

Maximum Likelihood Estimation

L1 and L2 regularization

Difference between hard-margin and soft-margin SVM

Regularization in ridge regression

What is slack variable

Differentiate between L1 and L2 regularization 

Sunday, November 29, 2020

Natural Language Processing (NLP) Multiple Choice Questions with answers 17

Top 3 MCQ on NLP, NLP quiz questions with answers, NLP MCQ questions, Solved questions in natural language processing, NLP practitioner exam questions, Un-smootherd MLE, TFIDF


Multiple Choice Questions in NLP - SET 17

1. Using TF-IDF (Term Frequency - Inverse Document Frequency) values for features in a uni-gram bag-of-words model should have an effect most similar to which of the following?

a) Lowercasing the data

b) Dropout regularization

c) Removing stop words

d) Increasing the learning rate

Answer: (c) Removing stop words

TF-IDF is a statistical measure that evaluates how relevant a word is to a document in a collection of documents. This is done by multiplying two metrics: how many times a word appears in a document, and the inverse document frequency of the word across a set of documents.

When the metric word frequency of occurrence (TF) in a document is used as a feature value, a higher weight tends to be assigned to words that appear frequently in a corpus (such as stop-words). The inverse document frequency (IDF) is a better metric, because it assigns a lower weight to frequent words. You calculate IDF as the log of the ratio of the number of documents in the training corpus to the number of documents containing the given word. Combining these numbers in a metric (TF/IDF) places greater importance on words that are frequent in the document but rare in the corpus.


2. Suppose you have the following training data for Naïve Bayes:

I liked the dish [LABEL = POS]

I disliked the dish because it contains sugar [LABEL = NEG]

Really tasty dish [LABEL = POS]

What is the unsmoothed Maximum Likelihood Estimate (MLE) of P(POS) for this data?

a) 1/3

b) 1/2

c) 1

d) 2/3

Answer: (d) 2/3

P(POS) = (Number of training data labeled as POS)/(total no. of training data) = 2/3

* Un-smoothed MLE ignores the fact that some words are more frequent than the others in languages

 

 

3. Use the data given in question (2) to answer this question. What is the unsmoothed Maximum Likelihood Estimate (MLE) of P(dish |POS)?

a) 2/17

b) 1/5

c) 2/7

d) 1/2

Answer: (c) 2/7

7 words emitted in positive samples, 2 of them are 'dish'.

P(dish|POS) = (No. of times ‘dish’ appears in dataset labeled POS)/(total no. of words appear in the POS dataset) = 2/7.

 

*************




Top interview questions in NLP

NLP quiz questions with answers explained

Online NLP quiz with solutions

question and answers in natural language processing

Unsmoothed MLE and how to calculate?

TFIDF example solved exercise in natural language processing 

Featured Content

Multiple choice questions in Natural Language Processing Home

MCQ in Natural Language Processing, Quiz questions with answers in NLP, Top interview questions in NLP with answers Multiple Choice Que...

All time most popular contents

data recovery